-
State-of-the-art Small Language Coder Model: Mify-Coder
Authors:
Abhinav Parmar,
Abhisek Panigrahi,
Abhishek Kumar Dwivedi,
Abhishek Bhattacharya,
Adarsh Ramachandra,
Aditya Choudhary,
Aditya Garg,
Aditya Raj,
Alankrit Bhatt,
Alpesh Yadav,
Anant Vishnu,
Ananthu Pillai,
Ankush Kumar,
Aryan Patnaik,
Aswatha Narayanan S,
Avanish Raj Singh,
Bhavya Shree Gadda,
Brijesh Pankajbhai Kachhadiya,
Buggala Jahnavi,
Chidurala Nithin Krishna,
Chintan Shah,
Chunduru Akshaya,
Debarshi Banerjee,
Debrup Dey,
Deepa R.
, et al. (71 additional authors not shown)
Abstract:
We present Mify-Coder, a 2.5B-parameter code model trained on 4.2T tokens using a compute-optimal strategy built on the Mify-2.5B foundation model. Mify-Coder achieves comparable accuracy and safety while significantly outperforming much larger baseline models on standard coding and function-calling benchmarks, demonstrating that compact models can match frontier-grade models in code generation an…
▽ More
We present Mify-Coder, a 2.5B-parameter code model trained on 4.2T tokens using a compute-optimal strategy built on the Mify-2.5B foundation model. Mify-Coder achieves comparable accuracy and safety while significantly outperforming much larger baseline models on standard coding and function-calling benchmarks, demonstrating that compact models can match frontier-grade models in code generation and agent-driven workflows. Our training pipeline combines high-quality curated sources with synthetic data generated through agentically designed prompts, refined iteratively using enterprise-grade evaluation datasets. LLM-based quality filtering further enhances data density, enabling frugal yet effective training. Through disciplined exploration of CPT-SFT objectives, data mixtures, and sampling dynamics, we deliver frontier-grade code intelligence within a single continuous training trajectory. Empirical evidence shows that principled data and compute discipline allow smaller models to achieve competitive accuracy, efficiency, and safety compliance. Quantized variants of Mify-Coder enable deployment on standard desktop environments without requiring specialized hardware.
△ Less
Submitted 26 December, 2025;
originally announced December 2025.
-
Compressive Modeling and Visualization of Multivariate Scientific Data using Implicit Neural Representation
Authors:
Abhay Kumar Dwivedi,
Shanu Saklani,
Soumya Dutta
Abstract:
The extensive adoption of Deep Neural Networks has led to their increased utilization in challenging scientific visualization tasks. Recent advancements in building compressed data models using implicit neural representations have shown promising results for tasks like spatiotemporal volume visualization and super-resolution. Inspired by these successes, we develop compressed neural representation…
▽ More
The extensive adoption of Deep Neural Networks has led to their increased utilization in challenging scientific visualization tasks. Recent advancements in building compressed data models using implicit neural representations have shown promising results for tasks like spatiotemporal volume visualization and super-resolution. Inspired by these successes, we develop compressed neural representations for multivariate datasets containing tens to hundreds of variables. Our approach utilizes a single network to learn representations for all data variables simultaneously through parameter sharing. This allows us to achieve state-of-the-art data compression. Through comprehensive evaluations, we demonstrate superior performance in terms of reconstructed data quality, rendering and visualization quality, preservation of dependency information among variables, and storage efficiency.
△ Less
Submitted 17 October, 2025;
originally announced October 2025.
-
Anomaly Detection in Intra-Vehicle Networks
Authors:
Ajeet Kumar Dwivedi
Abstract:
The progression of innovation and technology and ease of inter-connectivity among networks has allowed us to evolve towards one of the promising areas, the Internet of Vehicles. Nowadays, modern vehicles are connected to a range of networks, including intra-vehicle networks and external networks. However, a primary challenge in the automotive industry is to make the vehicle safe and reliable; part…
▽ More
The progression of innovation and technology and ease of inter-connectivity among networks has allowed us to evolve towards one of the promising areas, the Internet of Vehicles. Nowadays, modern vehicles are connected to a range of networks, including intra-vehicle networks and external networks. However, a primary challenge in the automotive industry is to make the vehicle safe and reliable; particularly with the loopholes in the existing traditional protocols, cyber-attacks on the vehicle network are rising drastically. Practically every vehicle uses the universal Controller Area Network (CAN) bus protocol for the communication between electronic control units to transmit key vehicle functionality and messages related to driver safety. The CAN bus system, although its critical significance, lacks the key feature of any protocol authentication and authorization. Resulting in compromises of CAN bus security leads to serious issues to both car and driver safety. This paper discusses the security issues of the CAN bus protocol and proposes an Intrusion Detection System (IDS) that detects known attacks on in-vehicle networks. Multiple Artificial Intelligence (AI) algorithms are employed to provide recognition of known potential cyber-attacks based on messages, timestamps, and data packets traveling through the CAN. The main objective of this paper is to accurately detect cyberattacks by considering time-series features and attack frequency. The majority of the evaluated AI algorithms, when considering attack frequency, correctly identify known attacks with remarkable accuracy of more than 99%. However, these models achieve approximately 92% to 97% accuracy when timestamps are not taken into account. Long Short Term Memory (LSTM), Xgboost, and SVC have proved to the well-performing classifiers.
△ Less
Submitted 6 May, 2022;
originally announced May 2022.