arXiv:2604.06945 [pdf, ps, other]

NTIRE 2026 Challenge on Bitstream-Corrupted Video Restoration: Methods and Results

Authors: Wenbin Zou, Tianyi Liu, Kejun Wu, Huiping Zhuang, Zongwei Wu, Zhuyun Zhou, Radu Timofte, Kim-Hui Yap, Lap-Pui Chau, Yi Wang, Shiqi Zhou, Xiaodi Shi, Yuxiang Chen, Yilian Zhong, Shibo Yin, Yushun Fang, Xilei Zhu, Yahui Wang, Chen Lu, Zhitao Wang, Lifa Ha, Hengyu Man, Xiaopeng Fan, Priyansh Singh, Sidharth , et al. (15 additional authors not shown)

Abstract: This paper reports on the NTIRE 2026 Challenge on Bitstream-Corrupted Video Restoration (BSCVR). The challenge aims to advance research on recovering visually coherent videos from corrupted bitstreams, whose decoding often produces severe spatial-temporal artifacts and content distortion. Built upon recent progress in bitstream-corrupted video recovery, the challenge provides a common benchmark fo… ▽ More This paper reports on the NTIRE 2026 Challenge on Bitstream-Corrupted Video Restoration (BSCVR). The challenge aims to advance research on recovering visually coherent videos from corrupted bitstreams, whose decoding often produces severe spatial-temporal artifacts and content distortion. Built upon recent progress in bitstream-corrupted video recovery, the challenge provides a common benchmark for evaluating restoration methods under realistic corruption settings. We describe the dataset, evaluation protocol, and participating methods, and summarize the final results and main technical trends. The challenge highlights the difficulty of this emerging task and provides useful insights for future research on robust video restoration under practical bitstream corruption. △ Less

Submitted 14 April, 2026; v1 submitted 8 April, 2026; originally announced April 2026.

Comments: 15 pages, 8 figures, 1 table, CVPRW2026 NTIRE Challenge Report

arXiv:2511.13954 [pdf, ps, other]

A Brain Wave Encodes a Thousand Tokens: Modeling Inter-Cortical Neural Interactions for Effective EEG-based Emotion Recognition

Authors: Nilay Kumar, Priyansh Bhandari, G. Maragatham

Abstract: Human emotions are difficult to convey through words and are often abstracted in the process; however, electroencephalogram (EEG) signals can offer a more direct lens into emotional brain activity. Recent studies show that deep learning models can process these signals to perform emotion recognition with high accuracy. However, many existing approaches overlook the dynamic interplay between distin… ▽ More Human emotions are difficult to convey through words and are often abstracted in the process; however, electroencephalogram (EEG) signals can offer a more direct lens into emotional brain activity. Recent studies show that deep learning models can process these signals to perform emotion recognition with high accuracy. However, many existing approaches overlook the dynamic interplay between distinct brain regions, which can be crucial to understanding how emotions unfold and evolve over time, potentially aiding in more accurate emotion recognition. To address this, we propose RBTransformer, a Transformer-based neural network architecture that models inter-cortical neural dynamics of the brain in latent space to better capture structured neural interactions for effective EEG-based emotion recognition. First, the EEG signals are converted into Band Differential Entropy (BDE) tokens, which are then passed through Electrode Identity embeddings to retain spatial provenance. These tokens are processed through successive inter-cortical multi-head attention blocks that construct an electrode x electrode attention matrix, allowing the model to learn the inter-cortical neural dependencies. The resulting features are then passed through a classification head to obtain the final prediction. We conducted extensive experiments, specifically under subject-dependent settings, on the SEED, DEAP, and DREAMER datasets, over all three dimensions, Valence, Arousal, and Dominance (for DEAP and DREAMER), under both binary and multi-class classification settings. The results demonstrate that the proposed RBTransformer outperforms all previous state-of-the-art methods across all three datasets, over all three dimensions under both classification settings. The source code is available at: https://github.com/nnilayy/RBTransformer. △ Less

Submitted 17 November, 2025; originally announced November 2025.

arXiv:2511.06019 [pdf, ps, other]

MiVID: Multi-Strategic Self-Supervision for Video Frame Interpolation using Diffusion Model

Authors: Priyansh Srivastava, Romit Chatterjee, Abir Sen, Aradhana Behura, Ratnakar Dash

Abstract: Video Frame Interpolation (VFI) remains a cornerstone in video enhancement, enabling temporal upscaling for tasks like slow-motion rendering, frame rate conversion, and video restoration. While classical methods rely on optical flow and learning-based models assume access to dense ground-truth, both struggle with occlusions, domain shifts, and ambiguous motion. This article introduces MiVID, a lig… ▽ More Video Frame Interpolation (VFI) remains a cornerstone in video enhancement, enabling temporal upscaling for tasks like slow-motion rendering, frame rate conversion, and video restoration. While classical methods rely on optical flow and learning-based models assume access to dense ground-truth, both struggle with occlusions, domain shifts, and ambiguous motion. This article introduces MiVID, a lightweight, self-supervised, diffusion-based framework for video interpolation. Our model eliminates the need for explicit motion estimation by combining a 3D U-Net backbone with transformer-style temporal attention, trained under a hybrid masking regime that simulates occlusions and motion uncertainty. The use of cosine-based progressive masking and adaptive loss scheduling allows our network to learn robust spatiotemporal representations without any high-frame-rate supervision. Our framework is evaluated on UCF101-7 and DAVIS-7 datasets. MiVID is trained entirely on CPU using the datasets and 9-frame video segments, making it a low-resource yet highly effective pipeline. Despite these constraints, our model achieves optimal results at just 50 epochs, competitive with several supervised baselines.This work demonstrates the power of self-supervised diffusion priors for temporally coherent frame synthesis and provides a scalable path toward accessible and generalizable VFI systems. △ Less

Submitted 8 November, 2025; originally announced November 2025.

arXiv:2511.05667 [pdf, ps, other]

SARCH: Multimodal Search for Archaeological Archives

Authors: Nivedita Sinha, Bharati Khanijo, Sanskar Singh, Priyansh Mahant, Ashutosh Roy, Saubhagya Singh Bhadouria, Arpan Jain, Maya Ramanath

Abstract: In this paper, we describe a multi-modal search system designed to search old archaeological books and reports. This corpus is digitally available as scanned PDFs, but varies widely in the quality of scans. Our pipeline, designed for multi-modal archaeological documents, extracts and indexes text, images (classified into maps, photos, layouts, and others), and tables. We evaluated different retrie… ▽ More In this paper, we describe a multi-modal search system designed to search old archaeological books and reports. This corpus is digitally available as scanned PDFs, but varies widely in the quality of scans. Our pipeline, designed for multi-modal archaeological documents, extracts and indexes text, images (classified into maps, photos, layouts, and others), and tables. We evaluated different retrieval strategies, including keyword-based search, embedding-based models, and a hybrid approach that selects optimal results from both modalities. We report and analyze our preliminary results and discuss future work in this exciting vertical. △ Less

Submitted 7 November, 2025; originally announced November 2025.

arXiv:2510.04999 [pdf, ps, other]

Bridging Text and Video Generation: A Survey

Authors: Nilay Kumar, Priyansh Bhandari, G. Maragatham

Abstract: Text-to-video (T2V) generation technology holds potential to transform multiple domains such as education, marketing, entertainment, and assistive technologies for individuals with visual or reading comprehension challenges, by creating coherent visual content from natural language prompts. From its inception, the field has advanced from adversarial models to diffusion-based models, yielding highe… ▽ More Text-to-video (T2V) generation technology holds potential to transform multiple domains such as education, marketing, entertainment, and assistive technologies for individuals with visual or reading comprehension challenges, by creating coherent visual content from natural language prompts. From its inception, the field has advanced from adversarial models to diffusion-based models, yielding higher-fidelity, temporally consistent outputs. Yet challenges persist, such as alignment, long-range coherence, and computational efficiency. Addressing this evolving landscape, we present a comprehensive survey of text-to-video generative models, tracing their development from early GANs and VAEs to hybrid Diffusion-Transformer (DiT) architectures, detailing how these models work, what limitations they addressed in their predecessors, and why shifts toward new architectural paradigms were necessary to overcome challenges in quality, coherence, and control. We provide a systematic account of the datasets, which the surveyed text-to-video models were trained and evaluated on, and, to support reproducibility and assess the accessibility of training such models, we detail their training configurations, including their hardware specifications, GPU counts, batch sizes, learning rates, optimizers, epochs, and other key hyperparameters. Further, we outline the evaluation metrics commonly used for evaluating such models and present their performance across standard benchmarks, while also discussing the limitations of these metrics and the emerging shift toward more holistic, perception-aligned evaluation strategies. Finally, drawing from our analysis, we outline the current open challenges and propose a few promising future directions, laying out a perspective for future researchers to explore and build upon in advancing T2V research and applications. △ Less

Submitted 6 October, 2025; originally announced October 2025.

arXiv:2506.12798 [pdf, ps, other]

Predicting Genetic Mutations from Single-Cell Bone Marrow Images in Acute Myeloid Leukemia Using Noise-Robust Deep Learning Models

Authors: Garima Jain, Ravi Kant Gupta, Priyansh Jain, Abhijeet Patil, Ardhendu Sekhar, Gajendra Smeeta, Sanghamitra Pati, Amit Sethi

Abstract: In this study, we propose a robust methodology for identification of myeloid blasts followed by prediction of genetic mutation in single-cell images of blasts, tackling challenges associated with label accuracy and data noise. We trained an initial binary classifier to distinguish between leukemic (blasts) and non-leukemic cells images, achieving 90 percent accuracy. To evaluate the models general… ▽ More In this study, we propose a robust methodology for identification of myeloid blasts followed by prediction of genetic mutation in single-cell images of blasts, tackling challenges associated with label accuracy and data noise. We trained an initial binary classifier to distinguish between leukemic (blasts) and non-leukemic cells images, achieving 90 percent accuracy. To evaluate the models generalization, we applied this model to a separate large unlabeled dataset and validated the predictions with two haemato-pathologists, finding an approximate error rate of 20 percent in the leukemic and non-leukemic labels. Assuming this level of label noise, we further trained a four-class model on images predicted as blasts to classify specific mutations. The mutation labels were known for only a bag of cell images extracted from a single slide. Despite the tumor label noise, our mutation classification model achieved 85 percent accuracy across four mutation classes, demonstrating resilience to label inconsistencies. This study highlights the capability of machine learning models to work with noisy labels effectively while providing accurate, clinically relevant mutation predictions, which is promising for diagnostic applications in areas such as haemato-pathology. △ Less

Submitted 15 June, 2025; originally announced June 2025.

Comments: 2 figues

arXiv:2506.00020 [pdf, other]

doi 10.1145/3695053.3731109

Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient Redistribution

Authors: Chang Eun Song, Priyansh Bhatnagar, Zihan Xia, Nam Sung Kim, Tajana Rosing, Mingu Kang

Abstract: Transformers, while revolutionary, face challenges due to their demanding computational cost and large data movement. To address this, we propose HyFlexPIM, a novel mixed-signal processing-in-memory (PIM) accelerator for inference that flexibly utilizes both single-level cell (SLC) and multi-level cell (MLC) RRAM technologies to trade-off accuracy and efficiency. HyFlexPIM achieves efficient dual-… ▽ More Transformers, while revolutionary, face challenges due to their demanding computational cost and large data movement. To address this, we propose HyFlexPIM, a novel mixed-signal processing-in-memory (PIM) accelerator for inference that flexibly utilizes both single-level cell (SLC) and multi-level cell (MLC) RRAM technologies to trade-off accuracy and efficiency. HyFlexPIM achieves efficient dual-mode operation by utilizing digital PIM for high-precision and write-intensive operations while analog PIM for high parallel and low-precision computations. The analog PIM further distributes tasks between SLC and MLC PIM operations, where a single analog PIM module can be reconfigured to switch between two operations (SLC/MLC) with minimal overhead (<1% for area & energy). Critical weights are allocated to SLC RRAM for high accuracy, while less critical weights are assigned to MLC RRAM to maximize capacity, power, and latency efficiency. However, despite employing such a hybrid mechanism, brute-force mapping on hardware fails to deliver significant benefits due to the limited proportion of weights accelerated by the MLC and the noticeable degradation in accuracy. To maximize the potential of our hybrid hardware architecture, we propose an algorithm co-optimization technique, called gradient redistribution, which uses Singular Value Decomposition (SVD) to decompose and truncate matrices based on their importance, then fine-tune them to concentrate significance into a small subset of weights. By doing so, only 5-10% of the weights have dominantly large gradients, making it favorable for HyFlexPIM by minimizing the use of expensive SLC RRAM while maximizing the efficient MLC RRAM. Our evaluation shows that HyFlexPIM significantly enhances computational throughput and energy efficiency, achieving maximum 1.86X and 1.45X higher than state-of-the-art methods. △ Less

Submitted 20 May, 2025; originally announced June 2025.

Comments: Accepted by ISCA'25

arXiv:2505.23788 [pdf, ps, other]

Nine Ways to Break Copyright Law and Why Our LLM Won't: A Fair Use Aligned Generation Framework

Authors: Aakash Sen Sharma, Debdeep Sanyal, Priyansh Srivastava, Sundar Atreya H., Shirish Karande, Mohan Kankanhalli, Murari Mandal

Abstract: Large language models (LLMs) commonly risk copyright infringement by reproducing protected content verbatim or with insufficient transformative modifications, posing significant ethical, legal, and practical concerns. Current inference-time safeguards predominantly rely on restrictive refusal-based filters, often compromising the practical utility of these models. To address this, we collaborated… ▽ More Large language models (LLMs) commonly risk copyright infringement by reproducing protected content verbatim or with insufficient transformative modifications, posing significant ethical, legal, and practical concerns. Current inference-time safeguards predominantly rely on restrictive refusal-based filters, often compromising the practical utility of these models. To address this, we collaborated closely with intellectual property experts to develop FUA-LLM (Fair Use Aligned Language Models), a legally-grounded framework explicitly designed to align LLM outputs with fair-use doctrine. Central to our method is FairUseDB, a carefully constructed dataset containing 18,000 expert-validated examples covering nine realistic infringement scenarios. Leveraging this dataset, we apply Direct Preference Optimization (DPO) to fine-tune open-source LLMs, encouraging them to produce legally compliant and practically useful alternatives rather than resorting to blunt refusal. Recognizing the shortcomings of traditional evaluation metrics, we propose new measures: Weighted Penalty Utility and Compliance Aware Harmonic Mean (CAH) to balance infringement risk against response utility. Extensive quantitative experiments coupled with expert evaluations confirm that FUA-LLM substantially reduces problematic outputs (up to 20\%) compared to state-of-the-art approaches, while preserving real-world usability. △ Less

Submitted 25 May, 2025; originally announced May 2025.

Comments: 30 Pages

arXiv:2505.19186 [pdf, ps, other]

PosePilot: An Edge-AI Solution for Posture Correction in Physical Exercises

Authors: Rushiraj Gadhvi, Priyansh Desai, Siddharth

Abstract: Automated pose correction remains a significant challenge in AI-driven fitness systems, despite extensive research in activity recognition. This work presents PosePilot, a novel system that integrates pose recognition with real-time personalized corrective feedback, overcoming the limitations of traditional fitness solutions. Using Yoga, a discipline requiring precise spatio-temporal alignment as… ▽ More Automated pose correction remains a significant challenge in AI-driven fitness systems, despite extensive research in activity recognition. This work presents PosePilot, a novel system that integrates pose recognition with real-time personalized corrective feedback, overcoming the limitations of traditional fitness solutions. Using Yoga, a discipline requiring precise spatio-temporal alignment as a case study, we demonstrate PosePilot's ability to analyze complex physical movements. Designed for deployment on edge devices, PosePilot can be extended to various at-home and outdoor exercises. We employ a Vanilla LSTM, allowing the system to capture temporal dependencies for pose recognition. Additionally, a BiLSTM with multi-head Attention enhances the model's ability to process motion contexts, selectively focusing on key limb angles for accurate error detection while maintaining computational efficiency. As part of this work, we introduce a high-quality video dataset used for evaluating our models. Most importantly, PosePilot provides instant corrective feedback at every stage of a movement, ensuring precise posture adjustments throughout the exercise routine. The proposed approach 1) performs automatic human posture recognition, 2) provides personalized posture correction feedback at each instant which is crucial in Yoga, and 3) offers a lightweight and robust posture correction model feasible for deploying on edge devices in real-world environments. △ Less

Submitted 25 May, 2025; originally announced May 2025.

Comments: Accepted for publication at IBPRIA 2025 Conference in Coimbra, Portugal

arXiv:2504.04764 [pdf, other]

Enhancing Leaf Disease Classification Using GAT-GCN Hybrid Model

Authors: Shyam Sundhar, Riya Sharma, Priyansh Maheshwari, Suvidha Rupesh Kumar, T. Sunil Kumar

Abstract: Agriculture plays a critical role in the global economy, providing livelihoods and ensuring food security for billions. As innovative agricultural practices become more widespread, the risk of crop diseases has increased, highlighting the urgent need for efficient, low-intervention disease identification methods. This research presents a hybrid model combining Graph Attention Networks (GATs) and G… ▽ More Agriculture plays a critical role in the global economy, providing livelihoods and ensuring food security for billions. As innovative agricultural practices become more widespread, the risk of crop diseases has increased, highlighting the urgent need for efficient, low-intervention disease identification methods. This research presents a hybrid model combining Graph Attention Networks (GATs) and Graph Convolution Networks (GCNs) for leaf disease classification. GCNs have been widely used for learning from graph-structured data, and GATs enhance this by incorporating attention mechanisms to focus on the most important neighbors. The methodology integrates superpixel segmentation for efficient feature extraction, partitioning images into meaningful, homogeneous regions that better capture localized features. The authors have employed an edge augmentation technique to enhance the robustness of the model. The edge augmentation technique has introduced a significant degree of generalization in the detection capabilities of the model. To further optimize training, weight initialization techniques are applied. The hybrid model is evaluated against the individual performance of the GCN and GAT models and the hybrid model achieved a precision of 0.9822, recall of 0.9818, and F1-score of 0.9818 in apple leaf disease classification, a precision of 0.9746, recall of 0.9744, and F1-score of 0.9743 in potato leaf disease classification, and a precision of 0.8801, recall of 0.8801, and F1-score of 0.8799 in sugarcane leaf disease classification. These results demonstrate the robustness and performance of the model, suggesting its potential to support sustainable agricultural practices through precise and effective disease detection. This work is a small step towards reducing the loss of crops and hence supporting sustainable goals of zero hunger and life on land. △ Less

Submitted 7 April, 2025; originally announced April 2025.

arXiv:2503.09150 [pdf, other]

AdaptAI: A Personalized Solution to Sense Your Stress, Fix Your Mess, and Boost Productivity

Authors: Rushiraj Gadhvi, Soham Petkar, Priyansh Desai, Shreyas Ramachandran, Siddharth Siddharth

Abstract: Personalization is a critical yet often overlooked factor in boosting productivity and wellbeing in knowledge-intensive workplaces to better address individual preferences. Existing tools typically offer uniform guidance whether auto-generating email responses or prompting break reminders without accounting for individual behavioral patterns or stress triggers. We introduce AdaptAI, a multimodal A… ▽ More Personalization is a critical yet often overlooked factor in boosting productivity and wellbeing in knowledge-intensive workplaces to better address individual preferences. Existing tools typically offer uniform guidance whether auto-generating email responses or prompting break reminders without accounting for individual behavioral patterns or stress triggers. We introduce AdaptAI, a multimodal AI solution combining egocentric vision and audio, heart and motion activities, and the agentic workflow of Large Language Models LLMs to deliver highly personalized productivity support and context-aware well-being interventions. AdaptAI not only automates peripheral tasks (e.g. drafting succinct document summaries, replying to emails etc.) but also continuously monitors the users unique physiological and situational indicators to dynamically tailor interventions such as micro-break suggestions or exercise prompts, at the exact point of need. In a preliminary study with 15 participants, AdaptAI demonstrated significant improvements in task throughput and user satisfaction by anticipating user stressors and streamlining daily workflows. △ Less

Submitted 12 March, 2025; originally announced March 2025.

Comments: Accepted for publication at the ACM Conference on Human Factors in Computing Systems (CHI) Late Breaking Work 2025

arXiv:2412.03673 [pdf, other]

Interpreting Transformers for Jet Tagging

Authors: Aaron Wang, Abhijith Gandrakota, Jennifer Ngadiuba, Vivekanand Sahu, Priyansh Bhatnagar, Elham E Khoda, Javier Duarte

Abstract: Machine learning (ML) algorithms, particularly attention-based transformer models, have become indispensable for analyzing the vast data generated by particle physics experiments like ATLAS and CMS at the CERN LHC. Particle Transformer (ParT), a state-of-the-art model, leverages particle-level attention to improve jet-tagging tasks, which are critical for identifying particles resulting from proto… ▽ More Machine learning (ML) algorithms, particularly attention-based transformer models, have become indispensable for analyzing the vast data generated by particle physics experiments like ATLAS and CMS at the CERN LHC. Particle Transformer (ParT), a state-of-the-art model, leverages particle-level attention to improve jet-tagging tasks, which are critical for identifying particles resulting from proton collisions. This study focuses on interpreting ParT by analyzing attention heat maps and particle-pair correlations on the $η$-$φ$ plane, revealing a binary attention pattern where each particle attends to at most one other particle. At the same time, we observe that ParT shows varying focus on important particles and subjets depending on decay, indicating that the model learns traditional jet substructure observables. These insights enhance our understanding of the model's internal workings and learning process, offering potential avenues for improving the efficiency of transformer architectures in future high-energy physics applications. △ Less

Submitted 8 December, 2024; v1 submitted 4 December, 2024; originally announced December 2024.

Comments: Accepted at the Machine Learning and the Physical Sciences Workshop, NeurIPS 2024

Report number: FERMILAB-CONF-24-0868-CMS-LDRD

arXiv:2411.15457 [pdf, other]

Hindi audio-video-Deepfake (HAV-DF): A Hindi language-based Audio-video Deepfake Dataset

Authors: Sukhandeep Kaur, Mubashir Buhari, Naman Khandelwal, Priyansh Tyagi, Kiran Sharma

Abstract: Deepfakes offer great potential for innovation and creativity, but they also pose significant risks to privacy, trust, and security. With a vast Hindi-speaking population, India is particularly vulnerable to deepfake-driven misinformation campaigns. Fake videos or speeches in Hindi can have an enormous impact on rural and semi-urban communities, where digital literacy tends to be lower and people… ▽ More Deepfakes offer great potential for innovation and creativity, but they also pose significant risks to privacy, trust, and security. With a vast Hindi-speaking population, India is particularly vulnerable to deepfake-driven misinformation campaigns. Fake videos or speeches in Hindi can have an enormous impact on rural and semi-urban communities, where digital literacy tends to be lower and people are more inclined to trust video content. The development of effective frameworks and detection tools to combat deepfake misuse requires high-quality, diverse, and extensive datasets. The existing popular datasets like FF-DF (FaceForensics++), and DFDC (DeepFake Detection Challenge) are based on English language.. Hence, this paper aims to create a first novel Hindi deep fake dataset, named ``Hindi audio-video-Deepfake'' (HAV-DF). The dataset has been generated using the faceswap, lipsyn and voice cloning methods. This multi-step process allows us to create a rich, varied dataset that captures the nuances of Hindi speech and facial expressions, providing a robust foundation for training and evaluating deepfake detection models in a Hindi language context. It is unique of its kind as all of the previous datasets contain either deepfake videos or synthesized audio. This type of deepfake dataset can be used for training a detector for both deepfake video and audio datasets. Notably, the newly introduced HAV-DF dataset demonstrates lower detection accuracy's across existing detection methods like Headpose, Xception-c40, etc. Compared to other well-known datasets FF-DF, and DFDC. This trend suggests that the HAV-DF dataset presents deeper challenges to detect, possibly due to its focus on Hindi language content and diverse manipulation techniques. The HAV-DF dataset fills the gap in Hindi-specific deepfake datasets, aiding multilingual deepfake detection development. △ Less

Submitted 23 November, 2024; originally announced November 2024.

arXiv:2411.10543 [pdf, other]

SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism

Authors: Priyansh Bhatnagar, Linfeng Wen, Mingu Kang

Abstract: Extensive efforts have been made to boost the performance in the domain of language models by introducing various attention-based transformers. However, the inclusion of linear layers with large dimensions contributes to significant computational and memory overheads. The escalating computational demands of these models necessitate the development of various compression techniques to ensure their… ▽ More Extensive efforts have been made to boost the performance in the domain of language models by introducing various attention-based transformers. However, the inclusion of linear layers with large dimensions contributes to significant computational and memory overheads. The escalating computational demands of these models necessitate the development of various compression techniques to ensure their deployment on devices, particularly in resource-constrained environments. In this paper, we propose a novel compression methodology that dynamically determines the rank of each layer using a soft thresholding mechanism, which clips the singular values with a small magnitude in a differentiable form. This approach automates the decision-making process to identify the optimal degree of compression for each layer. We have successfully applied the proposed technique to attention-based architectures, including BERT for discriminative tasks and GPT2 and TinyLlama for generative tasks. Additionally, we have validated our method on Mamba, a recently proposed state-space model. Our experiments demonstrate that the proposed technique achieves a speed-up of 1.33X to 1.72X in the encoder/ decoder with a 50% reduction in total parameters. △ Less

Submitted 15 November, 2024; originally announced November 2024.

arXiv:2403.02240 [pdf, other]

doi 10.1016/B978-0-443-29096-1.00008-8

Quantum Computing: Vision and Challenges

Authors: Sukhpal Singh Gill, Oktay Cetinkaya, Stefano Marrone, Daniel Claudino, David Haunschild, Leon Schlote, Huaming Wu, Carlo Ottaviani, Xiaoyuan Liu, Sree Pragna Machupalli, Kamalpreet Kaur, Priyansh Arora, Ji Liu, Ahmed Farouk, Houbing Herbert Song, Steve Uhlig, Kotagiri Ramamohanarao

Abstract: The recent development of quantum computing, which uses entanglement, superposition, and other quantum fundamental concepts, can provide substantial processing advantages over traditional computing. These quantum features help solve many complex problems that cannot be solved otherwise with conventional computing methods. These problems include modeling quantum mechanics, logistics, chemical-based… ▽ More The recent development of quantum computing, which uses entanglement, superposition, and other quantum fundamental concepts, can provide substantial processing advantages over traditional computing. These quantum features help solve many complex problems that cannot be solved otherwise with conventional computing methods. These problems include modeling quantum mechanics, logistics, chemical-based advances, drug design, statistical science, sustainable energy, banking, reliable communication, and quantum chemical engineering. The last few years have witnessed remarkable progress in quantum software and algorithm creation and quantum hardware research, which has significantly advanced the prospect of realizing quantum computers. It would be helpful to have comprehensive literature research on this area to grasp the current status and find outstanding problems that require considerable attention from the research community working in the quantum computing industry. To better understand quantum computing, this paper examines the foundations and vision based on current research in this area. We discuss cutting-edge developments in quantum computer hardware advancement and subsequent advances in quantum cryptography, quantum software, and high-scalability quantum computers. Many potential challenges and exciting new trends for quantum technology research and development are highlighted in this paper for a broader debate. △ Less

Submitted 7 April, 2025; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: Final Preprint Version Accepted for Publication in Elsevier Book - Quantum Computing Principles and Paradigms, 2025

Journal ref: Published In book: Quantum Computing: Principles and Paradigms, Chapter 2, Pages 19-42, Morgan Kaufmann, July 2025

arXiv:2401.02469 [pdf, other]

doi 10.1016/j.teler.2024.100116

Modern Computing: Vision and Challenges

Authors: Sukhpal Singh Gill, Huaming Wu, Panos Patros, Carlo Ottaviani, Priyansh Arora, Victor Casamayor Pujol, David Haunschild, Ajith Kumar Parlikad, Oktay Cetinkaya, Hanan Lutfiyya, Vlado Stankovski, Ruidong Li, Yuemin Ding, Junaid Qadir, Ajith Abraham, Soumya K. Ghosh, Houbing Herbert Song, Rizos Sakellariou, Omer Rana, Joel J. P. C. Rodrigues, Salil S. Kanhere, Schahram Dustdar, Steve Uhlig, Kotagiri Ramamohanarao, Rajkumar Buyya

Abstract: Over the past six decades, the computing systems field has experienced significant transformations, profoundly impacting society with transformational developments, such as the Internet and the commodification of computing. Underpinned by technological advancements, computer systems, far from being static, have been continuously evolving and adapting to cover multifaceted societal niches. This has… ▽ More Over the past six decades, the computing systems field has experienced significant transformations, profoundly impacting society with transformational developments, such as the Internet and the commodification of computing. Underpinned by technological advancements, computer systems, far from being static, have been continuously evolving and adapting to cover multifaceted societal niches. This has led to new paradigms such as cloud, fog, edge computing, and the Internet of Things (IoT), which offer fresh economic and creative opportunities. Nevertheless, this rapid change poses complex research challenges, especially in maximizing potential and enhancing functionality. As such, to maintain an economical level of performance that meets ever-tighter requirements, one must understand the drivers of new model emergence and expansion, and how contemporary challenges differ from past ones. To that end, this article investigates and assesses the factors influencing the evolution of computing systems, covering established systems and architectures as well as newer developments, such as serverless computing, quantum computing, and on-device AI on edge devices. Trends emerge when one traces technological trajectory, which includes the rapid obsolescence of frameworks due to business and technical constraints, a move towards specialized systems and models, and varying approaches to centralized and decentralized control. This comprehensive review of modern computing systems looks ahead to the future of research in the field, highlighting key challenges and emerging trends, and underscoring their importance in cost-effectively driving technological progress. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: Preprint submitted to Telematics and Informatics Reports, Elsevier (2024)

Journal ref: Elsevier Telematics and Informatics Reports, Volume 13, March 2024

arXiv:2307.14748 [pdf, other]

Semantic Image Completion and Enhancement using GANs

Authors: Priyansh Saxena, Raahat Gupta, Akshat Maheshwari, Saumil Maheshwari

Abstract: Semantic inpainting or image completion alludes to the task of inferring arbitrary large missing regions in images based on image semantics. Since the prediction of image pixels requires an indication of high-level context, this makes it significantly tougher than image completion, which is often more concerned with correcting data corruption and removing entire objects from the input image. On th… ▽ More Semantic inpainting or image completion alludes to the task of inferring arbitrary large missing regions in images based on image semantics. Since the prediction of image pixels requires an indication of high-level context, this makes it significantly tougher than image completion, which is often more concerned with correcting data corruption and removing entire objects from the input image. On the other hand, image enhancement attempts to eliminate unwanted noise and blur from the image, along with sustaining most of the image details. Efficient image completion and enhancement model should be able to recover the corrupted and masked regions in images and then refine the image further to increase the quality of the output image. Generative Adversarial Networks (GAN), have turned out to be helpful in picture completion tasks. In this chapter, we will discuss the underlying GAN architecture and how they can be used used for image completion tasks. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: This work is part of 'High-Performance Vision Intelligence'; Part of the Studies in Computational Intelligence book series (SCI, volume 913) and can be accessed at: https://link.springer.com/chapter/10.1007/978-981-15-6844-2_11. arXiv admin note: substantial text overlap with arXiv:1911.02222

arXiv:2307.12133 [pdf, other]

Route Planning Using Nature-Inspired Algorithms

Authors: Priyansh Saxena, Raahat Gupta, Akshat Maheshwari

Abstract: There are many different heuristic algorithms for solving combinatorial optimization problems that are commonly described as Nature-Inspired Algorithms (NIAs). Generally, they are inspired by some natural phenomenon, and due to their inherent converging and stochastic nature, they are known to give optimal results when compared to classical approaches. There are a large number of applications of N… ▽ More There are many different heuristic algorithms for solving combinatorial optimization problems that are commonly described as Nature-Inspired Algorithms (NIAs). Generally, they are inspired by some natural phenomenon, and due to their inherent converging and stochastic nature, they are known to give optimal results when compared to classical approaches. There are a large number of applications of NIAs, perhaps the most popular being route planning problems in robotics - problems that require a sequence of translation and rotation steps from the start to the goal in an optimized manner while avoiding obstacles in the environment. In this chapter, we will first give an overview of Nature-Inspired Algorithms, followed by their classification and common examples. We will then discuss how the NIAs have applied to solve the route planning problem. △ Less

Submitted 22 July, 2023; originally announced July 2023.

Comments: This work is part of 'High-Performance Vision Intelligence'; Part of the Studies in Computational Intelligence book series (SCI,volume 913) and can be accessed at: https://link.springer.com/chapter/10.1007/978-981-15-6844-2_15

arXiv:2306.03823 [pdf]

doi 10.1016/j.iotcps.2023.06.002

Transformative Effects of ChatGPT on Modern Education: Emerging Era of AI Chatbots

Authors: Sukhpal Singh Gill, Minxian Xu, Panos Patros, Huaming Wu, Rupinder Kaur, Kamalpreet Kaur, Stephanie Fuller, Manmeet Singh, Priyansh Arora, Ajith Kumar Parlikad, Vlado Stankovski, Ajith Abraham, Soumya K. Ghosh, Hanan Lutfiyya, Salil S. Kanhere, Rami Bahsoon, Omer Rana, Schahram Dustdar, Rizos Sakellariou, Steve Uhlig, Rajkumar Buyya

Abstract: ChatGPT, an AI-based chatbot, was released to provide coherent and useful replies based on analysis of large volumes of data. In this article, leading scientists, researchers and engineers discuss the transformative effects of ChatGPT on modern education. This research seeks to improve our knowledge of ChatGPT capabilities and its use in the education sector, identifying potential concerns and cha… ▽ More ChatGPT, an AI-based chatbot, was released to provide coherent and useful replies based on analysis of large volumes of data. In this article, leading scientists, researchers and engineers discuss the transformative effects of ChatGPT on modern education. This research seeks to improve our knowledge of ChatGPT capabilities and its use in the education sector, identifying potential concerns and challenges. Our preliminary evaluation concludes that ChatGPT performed differently in each subject area including finance, coding and maths. While ChatGPT has the ability to help educators by creating instructional content, offering suggestions and acting as an online educator to learners by answering questions and promoting group work, there are clear drawbacks in its use, such as the possibility of producing inaccurate or false data and circumventing duplicate content (plagiarism) detectors where originality is essential. The often reported hallucinations within Generative AI in general, and also relevant for ChatGPT, can render its use of limited benefit where accuracy is essential. What ChatGPT lacks is a stochastic measure to help provide sincere and sensitive communication with its users. Academic regulations and evaluation practices used in educational institutions need to be updated, should ChatGPT be used as a tool in education. To address the transformative effects of ChatGPT on the learning environment, educating teachers and students alike about its capabilities and limitations will be crucial. △ Less

Submitted 25 May, 2023; originally announced June 2023.

Comments: Preprint submitted to IoTCPS Elsevier (2023)

Journal ref: Internet of Things and Cyber-Physical Systems (Elsevier), Volume 4, 2024, Pages 19-23

arXiv:2301.02113 [pdf, other]

Anaphora Resolution in Dialogue: System Description (CODI-CRAC 2022 Shared Task)

Authors: Tatiana Anikina, Natalia Skachkova, Joseph Renner, Priyansh Trivedi

Abstract: We describe three models submitted for the CODI-CRAC 2022 shared task. To perform identity anaphora resolution, we test several combinations of the incremental clustering approach based on the Workspace Coreference System (WCS) with other coreference models. The best result is achieved by adding the ''cluster merging'' version of the coref-hoi model, which brings up to 10.33% improvement 1 over va… ▽ More We describe three models submitted for the CODI-CRAC 2022 shared task. To perform identity anaphora resolution, we test several combinations of the incremental clustering approach based on the Workspace Coreference System (WCS) with other coreference models. The best result is achieved by adding the ''cluster merging'' version of the coref-hoi model, which brings up to 10.33% improvement 1 over vanilla WCS clustering. Discourse deixis resolution is implemented as multi-task learning: we combine the learning objective of corefhoi with anaphor type classification. We adapt the higher-order resolution model introduced in Joshi et al. (2019) for bridging resolution given gold mentions and anaphors. △ Less

Submitted 5 January, 2023; originally announced January 2023.

Journal ref: CODI-CRAC 2022, Oct 2022, Gyeongju, South Korea

arXiv:2111.03274 [pdf]

doi 10.2174/2666255813999200904113251

Pathological Analysis of Blood Cells Using Deep Learning Techniques

Authors: Virender Ranga, Shivam Gupta, Priyansh Agrawal, Jyoti Meena

Abstract: Pathology deals with the practice of discovering the reasons for disease by analyzing the body samples. The most used way in this field, is to use histology which is basically studying and viewing microscopic structures of cell and tissues. The slide viewing method is widely being used and converted into digital form to produce high resolution images. This enabled the area of deep learning and mac… ▽ More Pathology deals with the practice of discovering the reasons for disease by analyzing the body samples. The most used way in this field, is to use histology which is basically studying and viewing microscopic structures of cell and tissues. The slide viewing method is widely being used and converted into digital form to produce high resolution images. This enabled the area of deep learning and machine learning to deep dive into this field of medical sciences. In the present study, a neural based network has been proposed for classification of blood cells images into various categories. When input image is passed through the proposed architecture and all the hyper parameters and dropout ratio values are used in accordance with proposed algorithm, then model classifies the blood images with an accuracy of 95.24%. The performance of proposed model is better than existing standard architectures and work done by various researchers. Thus model will enable development of pathological system which will reduce human errors and daily load on laboratory men. This will in turn help pathologists in carrying out their work more efficiently and effectively. △ Less

Submitted 5 November, 2021; originally announced November 2021.

Comments: 6 Page, 3 Table and 6 Figures

Journal ref: Recent Advances in Computer Science and Communications(Formerly Recent Patents on Computer Science),04 September,2020, Article ID e140921185564

arXiv:2111.03270 [pdf]

doi 10.1080/03091902.2020.1791988

Automated Human Mind Reading Using EEG Signals for Seizure Detection

Authors: Virender Ranga, Shivam Gupta, Jyoti Meena, Priyansh Agrawal

Abstract: Epilepsy is one of the most occurring neurological disease globally emerged back in 4000 BC. It is affecting around 50 million people of all ages these days. The trait of this disease is recurrent seizures. In the past few decades, the treatments available for seizure control have improved a lot with the advancements in the field of medical science and technology. Electroencephalogram (EEG) is a w… ▽ More Epilepsy is one of the most occurring neurological disease globally emerged back in 4000 BC. It is affecting around 50 million people of all ages these days. The trait of this disease is recurrent seizures. In the past few decades, the treatments available for seizure control have improved a lot with the advancements in the field of medical science and technology. Electroencephalogram (EEG) is a widely used technique for monitoring the brain activity and widely popular for seizure region detection. It is performed before surgery and also to predict seizure at the time operation which is useful in neuro stimulation device. But in most of cases visual examination is done by neurologist in order to detect and classify patterns of the disease but this requires a lot of pre-domain knowledge and experience. This all in turns put a pressure on neurosurgeons and leads to time wastage and also reduce their accuracy and efficiency. There is a need of some automated systems in arena of information technology like use of neural networks in deep learning which can assist neurologists. In the present paper, a model is proposed to give an accuracy of 98.33% which can be used for development of automated systems. The developed system will significantly help neurologists in their performance. △ Less

Submitted 5 November, 2021; originally announced November 2021.

Comments: 11 Pages, 12 Figures, 5 Tables

Journal ref: Journal of Medical Engineering & Technology,2020, 44:5, 237-246

arXiv:2111.03265 [pdf]

doi 10.14201/ADCAIJ2021104429446

EpilNet: A Novel Approach to IoT based Epileptic Seizure Prediction and Diagnosis System using Artificial Intelligence

Authors: Shivam Gupta, Virender Ranga, Priyansh Agrawal

Abstract: Epilepsy is one of the most occurring neurological diseases. The main characteristic of this disease is a frequent seizure, which is an electrical imbalance in the brain. It is generally accompanied by shaking of body parts and even leads (fainting). In the past few years, many treatments have come up. These mainly involve the use of anti-seizure drugs for controlling seizures. But in 70% of cases… ▽ More Epilepsy is one of the most occurring neurological diseases. The main characteristic of this disease is a frequent seizure, which is an electrical imbalance in the brain. It is generally accompanied by shaking of body parts and even leads (fainting). In the past few years, many treatments have come up. These mainly involve the use of anti-seizure drugs for controlling seizures. But in 70% of cases, these drugs are not effective, and surgery is the only solution when the condition worsens. So patients need to take care of themselves while having a seizure and be safe. Wearable electroencephalogram (EEG) devices have come up with the development in medical science and technology. These devices help in the analysis of brain electrical activities. EEG helps in locating the affected cortical region. The most important is that it can predict any seizure in advance on-site. This has resulted in a sudden increase in demand for effective and efficient seizure prediction and diagnosis systems. A novel approach to epileptic seizure prediction and diagnosis system EpilNet is proposed in the present paper. It is a one-dimensional (1D) convolution neural network. EpilNet gives the testing accuracy of 79.13% for five classes, leading to a significant increase of about 6-7% compared to related works. The developed Web API helps in bringing EpilNet into practical use. Thus, it is an integrated system for both patients and doctors. The system will help patients prevent injury or accidents and increase the efficiency of the treatment process by doctors in the hospitals. △ Less

Submitted 5 November, 2021; originally announced November 2021.

Comments: 12 Pages, 12 Figures, 2 Tables

Journal ref: ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, Issue, Vol. 10 N. 4 (2021), 429-446

arXiv:2009.10847 [pdf, other]

Message Passing for Hyper-Relational Knowledge Graphs

Authors: Mikhail Galkin, Priyansh Trivedi, Gaurav Maheshwari, Ricardo Usbeck, Jens Lehmann

Abstract: Hyper-relational knowledge graphs (KGs) (e.g., Wikidata) enable associating additional key-value pairs along with the main triple to disambiguate, or restrict the validity of a fact. In this work, we propose a message passing based graph encoder - StarE capable of modeling such hyper-relational KGs. Unlike existing approaches, StarE can encode an arbitrary number of additional information (qualifi… ▽ More Hyper-relational knowledge graphs (KGs) (e.g., Wikidata) enable associating additional key-value pairs along with the main triple to disambiguate, or restrict the validity of a fact. In this work, we propose a message passing based graph encoder - StarE capable of modeling such hyper-relational KGs. Unlike existing approaches, StarE can encode an arbitrary number of additional information (qualifiers) along with the main triple while keeping the semantic roles of qualifiers and triples intact. We also demonstrate that existing benchmarks for evaluating link prediction (LP) performance on hyper-relational KGs suffer from fundamental flaws and thus develop a new Wikidata-based dataset - WD50K. Our experiments demonstrate that StarE based LP model outperforms existing approaches across multiple benchmarks. We also confirm that leveraging qualifiers is vital for link prediction with gains up to 25 MRR points compared to triple-based representations. △ Less

Submitted 22 September, 2020; originally announced September 2020.

Comments: Accepted to EMNLP 2020

arXiv:1911.10519 [pdf, other]

doi 10.1080/0952813X.2022.2059107

Three Dimensional Route Planning for Multiple Unmanned Aerial Vehicles using Salp Swarm Algorithm

Authors: Priyansh Saxena, Ram Kishan Dewangan

Abstract: Route planning for multiple Unmanned Aerial Vehicles (UAVs) is a series of translation and rotational steps from a given start location to the destination goal location. The goal of the route planning problem is to determine the most optimal route avoiding any collisions with the obstacles present in the environment. Route planning is an NP-hard optimization problem. In this paper, a newly propose… ▽ More Route planning for multiple Unmanned Aerial Vehicles (UAVs) is a series of translation and rotational steps from a given start location to the destination goal location. The goal of the route planning problem is to determine the most optimal route avoiding any collisions with the obstacles present in the environment. Route planning is an NP-hard optimization problem. In this paper, a newly proposed Salp Swarm Algorithm (SSA) is used, and its performance is compared with deterministic and other Nature-Inspired Algorithms (NIAs). The results illustrate that SSA outperforms all the other meta-heuristic algorithms in route planning for multiple UAVs in a 3D environment. The proposed approach improves the average cost and overall time by 1.25% and 6.035% respectively when compared to recently reported data. Route planning is involved in many real-life applications like robot navigation, self-driving car, autonomous UAV for search and rescue operations in dangerous ground-zero situations, civilian surveillance, military combat and even commercial services like package delivery by drones. △ Less

Submitted 16 July, 2023; v1 submitted 24 November, 2019; originally announced November 2019.

Comments: This work has been previously published in the 'Journal of Experimental & Theoretical Artificial Intelligence' and can be accessed at https://www.tandfonline.com/doi/abs/10.1080/0952813X.2022.2059107

arXiv:1911.02268 [pdf, other]

Robot navigation and target capturing using nature-inspired approaches in a dynamic environment

Authors: Devansh Verma, Priyansh Saxena, Ritu Tiwari

Abstract: Path Planning and target searching in a three-dimensional environment is a challenging task in the field of robotics. It is an optimization problem as the path from source to destination has to be optimal. This paper aims to generate a collision-free trajectory in a dynamic environment. The path planning problem has sought to be of extreme importance in the military, search and rescue missions and… ▽ More Path Planning and target searching in a three-dimensional environment is a challenging task in the field of robotics. It is an optimization problem as the path from source to destination has to be optimal. This paper aims to generate a collision-free trajectory in a dynamic environment. The path planning problem has sought to be of extreme importance in the military, search and rescue missions and in life-saving tasks. During its operation, the unmanned air vehicle operates in a hostile environment, and faster replanning is needed to reach the target as optimally as possible. This paper presents a novel approach of hierarchical planning using multiresolution abstract levels for faster replanning. Economic constraints like path length, total path planning time and the number of turns are taken into consideration that mandate the use of cost functions. Experimental results show that the hierarchical version of GSO gives better performance compared to the BBO, IWO and their hierarchical versions. △ Less

Submitted 6 November, 2019; originally announced November 2019.

Comments: 8 pages, 8 figures

arXiv:1911.02265 [pdf, other]

doi 10.1007/978-981-15-6067-5_30

Predictive modeling of brain tumor: A Deep learning approach

Authors: Priyansh Saxena, Akshat Maheshwari, Saumil Maheshwari

Abstract: Image processing concepts can visualize the different anatomy structure of the human body. Recent advancements in the field of deep learning have made it possible to detect the growth of cancerous tissue just by a patient's brain Magnetic Resonance Imaging (MRI) scans. These methods require very high accuracy and meager false negative rates to be of any practical use. This paper presents a Convolu… ▽ More Image processing concepts can visualize the different anatomy structure of the human body. Recent advancements in the field of deep learning have made it possible to detect the growth of cancerous tissue just by a patient's brain Magnetic Resonance Imaging (MRI) scans. These methods require very high accuracy and meager false negative rates to be of any practical use. This paper presents a Convolutional Neural Network (CNN) based transfer learning approach to classify the brain MRI scans into two classes using three pre-trained models. The performances of these models are compared with each other. Experimental results show that the Resnet-50 model achieves the highest accuracy and least false negative rates as 95% and zero respectively. It is followed by VGG-16 and Inception-V3 model with an accuracy of 90% and 55% respectively. △ Less

Submitted 16 July, 2023; v1 submitted 6 November, 2019; originally announced November 2019.

Comments: This work is part of the conference proceeding 'Proceedings of the International Conference on Artificial Intelligence' and can be accessed at https://link.springer.com/chapter/10.1007/978-981-15-6067-5_30

arXiv:1911.02222 [pdf, other]

doi 10.1109/ICCCNT45670.2019.8944750

Semantic Image Completion and Enhancement using Deep Learning

Authors: Vaishnav Chandak, Priyansh Saxena, Manisha Pattanaik, Gaurav Kaushal

Abstract: In real-life applications, certain images utilized are corrupted in which the image pixels are damaged or missing, which increases the complexity of computer vision tasks. In this paper, a deep learning architecture is proposed to deal with image completion and enhancement. Generative Adversarial Networks (GAN), has been turned out to be helpful in picture completion tasks. Therefore, in GANs, Was… ▽ More In real-life applications, certain images utilized are corrupted in which the image pixels are damaged or missing, which increases the complexity of computer vision tasks. In this paper, a deep learning architecture is proposed to deal with image completion and enhancement. Generative Adversarial Networks (GAN), has been turned out to be helpful in picture completion tasks. Therefore, in GANs, Wasserstein GAN architecture is used for image completion which creates the coarse patches to filling the missing region in the distorted picture, and the enhancement network will additionally refine the resultant pictures utilizing residual learning procedures and hence give better complete pictures for computer vision applications. Experimental outcomes show that the proposed approach improves the Peak Signal to Noise ratio and Structural Similarity Index values by 2.45% and 4% respectively when compared to the recently reported data. △ Less

Submitted 5 January, 2020; v1 submitted 6 November, 2019; originally announced November 2019.

Comments: 6 pages, 8 figures. Proceedings of "The 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT)". Conference Proceedings ISBN Number: ISBN: 978-1-5386-5906-9; Link: https://ieeexplore.ieee.org/document/8944750

arXiv:1907.09361 [pdf, other]

Introduction to Neural Network based Approaches for Question Answering over Knowledge Graphs

Authors: Nilesh Chakraborty, Denis Lukovnikov, Gaurav Maheshwari, Priyansh Trivedi, Jens Lehmann, Asja Fischer

Abstract: Question answering has emerged as an intuitive way of querying structured data sources, and has attracted significant advancements over the years. In this article, we provide an overview over these recent advancements, focusing on neural network based question answering systems over knowledge graphs. We introduce readers to the challenges in the tasks, current paradigms of approaches, discuss nota… ▽ More Question answering has emerged as an intuitive way of querying structured data sources, and has attracted significant advancements over the years. In this article, we provide an overview over these recent advancements, focusing on neural network based question answering systems over knowledge graphs. We introduce readers to the challenges in the tasks, current paradigms of approaches, discuss notable advancements, and outline the emerging trends in the field. Through this article, we aim to provide newcomers to the field with a suitable entry point, and ease their process of making informed decisions while creating their own QA system. △ Less

Submitted 22 July, 2019; originally announced July 2019.

Comments: Preprint, under review. The first four authors contributed equally to this paper, and should be regarded as co-first authors

arXiv:1811.01118 [pdf, other]

Learning to Rank Query Graphs for Complex Question Answering over Knowledge Graphs

Authors: Gaurav Maheshwari, Priyansh Trivedi, Denis Lukovnikov, Nilesh Chakraborty, Asja Fischer, Jens Lehmann

Abstract: In this paper, we conduct an empirical investigation of neural query graph ranking approaches for the task of complex question answering over knowledge graphs. We experiment with six different ranking models and propose a novel self-attention based slot matching model which exploits the inherent structure of query graphs, our logical form of choice. Our proposed model generally outperforms the oth… ▽ More In this paper, we conduct an empirical investigation of neural query graph ranking approaches for the task of complex question answering over knowledge graphs. We experiment with six different ranking models and propose a novel self-attention based slot matching model which exploits the inherent structure of query graphs, our logical form of choice. Our proposed model generally outperforms the other models on two QA datasets over the DBpedia knowledge graph, evaluated in different settings. In addition, we show that transfer learning from the larger of those QA datasets to the smaller dataset yields substantial improvements, effectively offsetting the general lack of training data. △ Less

Submitted 2 November, 2018; originally announced November 2018.

arXiv:1802.03701 [pdf, other]

Formal Ontology Learning from English IS-A Sentences

Authors: Sourish Dasgupta, Ankur Padia, Gaurav Maheshwari, Priyansh Trivedi, Jens Lehmann

Abstract: Ontology learning (OL) is the process of automatically generating an ontological knowledge base from a plain text document. In this paper, we propose a new ontology learning approach and tool, called DLOL, which generates a knowledge base in the description logic (DL) SHOQ(D) from a collection of factual non-negative IS-A sentences in English. We provide extensive experimental results on the accur… ▽ More Ontology learning (OL) is the process of automatically generating an ontological knowledge base from a plain text document. In this paper, we propose a new ontology learning approach and tool, called DLOL, which generates a knowledge base in the description logic (DL) SHOQ(D) from a collection of factual non-negative IS-A sentences in English. We provide extensive experimental results on the accuracy of DLOL, giving experimental comparisons to three state-of-the-art existing OL tools, namely Text2Onto, FRED, and LExO. Here, we use the standard OL accuracy measure, called lexical accuracy, and a novel OL accuracy measure, called instance-based inference model. In our experimental results, DLOL turns out to be about 21% and 46%, respectively, better than the best of the other three approaches. △ Less

Submitted 11 February, 2018; originally announced February 2018.

arXiv:1611.04822 [pdf, other]

SimDoc: Topic Sequence Alignment based Document Similarity Framework

Authors: Gaurav Maheshwari, Priyansh Trivedi, Harshita Sahijwani, Kunal Jha, Sourish Dasgupta, Jens Lehmann

Abstract: Document similarity is the problem of estimating the degree to which a given pair of documents has similar semantic content. An accurate document similarity measure can improve several enterprise relevant tasks such as document clustering, text mining, and question-answering. In this paper, we show that a document's thematic flow, which is often disregarded by bag-of-word techniques, is pivotal in… ▽ More Document similarity is the problem of estimating the degree to which a given pair of documents has similar semantic content. An accurate document similarity measure can improve several enterprise relevant tasks such as document clustering, text mining, and question-answering. In this paper, we show that a document's thematic flow, which is often disregarded by bag-of-word techniques, is pivotal in estimating their similarity. To this end, we propose a novel semantic document similarity framework, called SimDoc. We model documents as topic-sequences, where topics represent latent generative clusters of related words. Then, we use a sequence alignment algorithm to estimate their semantic similarity. We further conceptualize a novel mechanism to compute topic-topic similarity to fine tune our system. In our experiments, we show that SimDoc outperforms many contemporary bag-of-words techniques in accurately computing document similarity, and on practical applications such as document clustering. △ Less

Submitted 11 November, 2017; v1 submitted 15 November, 2016; originally announced November 2016.

arXiv:1503.05667 [pdf, other]

BitSim: An Algebraic Similarity Measure for Description Logics Concepts

Authors: Sourish Dasgupta, Gaurav Maheshwari, Priyansh Trivedi

Abstract: In this paper, we propose an algebraic similarity measure σBS (BS stands for BitSim) for assigning semantic similarity score to concept definitions in ALCH+ an expressive fragment of Description Logics (DL). We define an algebraic interpretation function, I_B, that maps a concept definition to a unique string (ω_B) called bit-code) over an alphabet Σ_B of 11 symbols belonging to L_B - the language… ▽ More In this paper, we propose an algebraic similarity measure σBS (BS stands for BitSim) for assigning semantic similarity score to concept definitions in ALCH+ an expressive fragment of Description Logics (DL). We define an algebraic interpretation function, I_B, that maps a concept definition to a unique string (ω_B) called bit-code) over an alphabet Σ_B of 11 symbols belonging to L_B - the language over P B. IB has semantic correspondence with conventional model-theoretic interpretation of DL. We then define σ_BS on L_B. A detailed analysis of I_B and σ_BS has been given. △ Less

Submitted 19 March, 2015; originally announced March 2015.

Showing 1–33 of 33 results for author: Priyansh