Skip to main content

Showing 1–16 of 16 results for author: Khattar, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.07685  [pdf, ps, other

    cs.DC cs.CL cs.LG

    Scalable Training of Mixture-of-Experts Models with Megatron Core

    Authors: Zijie Yan, Hongxiao Bai, Xin Yao, Dennis Liu, Tong Liu, Hongbin Liu, Pingtian Li, Evan Wu, Shiqing Fan, Li Tao, Robin Zhang, Yuzhong Wang, Shifang Xu, Jack Chang, Xuwen Chen, Kunlun Li, Yan Bai, Gao Deng, Nan Zheng, Vijay Anand Korthikanti, Abhinav Khattar, Ethan He, Soham Govande, Sangkug Lym, Zhongbo Zhu , et al. (20 additional authors not shown)

    Abstract: Scaling Mixture-of-Experts (MoE) training introduces systems challenges absent in dense models. Because each token activates only a subset of experts, this sparsity allows total parameters to grow much faster than per-token computation, creating coupled constraints across memory, communication, and computation. Optimizing one dimension often shifts pressure to another, demanding co-design across t… ▽ More

    Submitted 10 March, 2026; v1 submitted 8 March, 2026; originally announced March 2026.

    Comments: Technical Report. 88 pages. 42 figures

  2. arXiv:2601.18089  [pdf, ps, other

    cs.LG cs.AI

    LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts

    Authors: Venmugil Elango, Nidhi Bhatia, Roger Waleffe, Rasoul Shafipour, Tomer Asida, Abhinav Khattar, Nave Assaf, Maximilian Golub, Joey Guman, Tiyasa Mitra, Ritchie Zhao, Ritika Borkar, Ran Zilberstein, Mostofa Patwary, Mohammad Shoeybi, Bita Rouhani

    Abstract: Mixture of Experts (MoEs) have become a central component of many state-of-the-art open-source and proprietary large language models. Despite their widespread adoption, it remains unclear how close existing MoE architectures are to optimal with respect to inference cost, as measured by accuracy per floating-point operation and per parameter. In this work, we revisit MoE design from a hardware-soft… ▽ More

    Submitted 25 January, 2026; originally announced January 2026.

  3. arXiv:2512.20856  [pdf, ps, other

    cs.CL cs.AI cs.LG

    NVIDIA Nemotron 3: Efficient and Open Intelligence

    Authors: NVIDIA, :, Aaron Blakeman, Aaron Grattafiori, Aarti Basant, Abhibha Gupta, Abhinav Khattar, Adi Renduchintala, Aditya Vavre, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Kondratenko, Alexander Bukharin, Alexandre Milesi, Ali Taghibakhshi, Alisa Liu, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amir Klein, Amit Zuker, Amnon Geifman, Amy Shen, Anahita Bhiwandiwalla , et al. (334 additional authors not shown)

    Abstract: We introduce the Nemotron 3 family of models - Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities. The Nemotron 3 family uses a Mixture-of-Experts hybrid Mamba-Transformer architecture to provide best-in-class throughput and context lengths of up to 1M tokens. Super and Ultra models are trained with NVFP4 and incorporate LatentMoE, a novel appro… ▽ More

    Submitted 23 December, 2025; originally announced December 2025.

  4. arXiv:2512.20848  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

    Authors: NVIDIA, :, Aaron Blakeman, Aaron Grattafiori, Aarti Basant, Abhibha Gupta, Abhinav Khattar, Adi Renduchintala, Aditya Vavre, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Kondratenko, Alexander Bukharin, Alexandre Milesi, Ali Taghibakhshi, Alisa Liu, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amir Klein, Amit Zuker, Amnon Geifman, Amy Shen, Anahita Bhiwandiwalla , et al. (289 additional authors not shown)

    Abstract: We present Nemotron 3 Nano 30B-A3B, a Mixture-of-Experts hybrid Mamba-Transformer language model. Nemotron 3 Nano was pretrained on 25 trillion text tokens, including more than 3 trillion new unique tokens over Nemotron 2, followed by supervised fine tuning and large-scale RL on diverse environments. Nemotron 3 Nano achieves better accuracy than our previous generation Nemotron 2 Nano while activa… ▽ More

    Submitted 23 December, 2025; originally announced December 2025.

  5. arXiv:2508.14444  [pdf, ps, other

    cs.CL cs.AI cs.LG

    NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

    Authors: NVIDIA, :, Aarti Basant, Abhijit Khairnar, Abhijit Paithankar, Abhinav Khattar, Adithya Renduchintala, Aditya Malte, Akhiad Bercovich, Akshay Hazare, Alejandra Rico, Aleksander Ficek, Alex Kondratenko, Alex Shaposhnikov, Alexander Bukharin, Ali Taghibakhshi, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amy Shen, Andrew Tao, Ann Guan, Anna Shors, Anubhav Mandarwal, Arham Mehta, Arun Venkatesan , et al. (192 additional authors not shown)

    Abstract: We introduce Nemotron-Nano-9B-v2, a hybrid Mamba-Transformer language model designed to increase throughput for reasoning workloads while achieving state-of-the-art accuracy compared to similarly-sized models. Nemotron-Nano-9B-v2 builds on the Nemotron-H architecture, in which the majority of the self-attention layers in the common Transformer architecture are replaced with Mamba-2 layers, to achi… ▽ More

    Submitted 2 September, 2025; v1 submitted 20 August, 2025; originally announced August 2025.

  6. arXiv:2505.00949  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Llama-Nemotron: Efficient Reasoning Models

    Authors: Akhiad Bercovich, Itay Levy, Izik Golan, Mohammad Dabbah, Ran El-Yaniv, Omri Puny, Ido Galil, Zach Moshe, Tomer Ronen, Najeeb Nabwani, Ido Shahaf, Oren Tropp, Ehud Karpas, Ran Zilberstein, Jiaqi Zeng, Soumye Singhal, Alexander Bukharin, Yian Zhang, Tugrul Konuk, Gerald Shen, Ameya Sunil Mahabaleshwarkar, Bilal Kartal, Yoshi Suhara, Olivier Delalleau, Zijia Chen , et al. (111 additional authors not shown)

    Abstract: We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior i… ▽ More

    Submitted 9 September, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

  7. arXiv:2504.03624  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

    Authors: NVIDIA, :, Aaron Blakeman, Aarti Basant, Abhinav Khattar, Adithya Renduchintala, Akhiad Bercovich, Aleksander Ficek, Alexis Bjorlin, Ali Taghibakhshi, Amala Sanjay Deshmukh, Ameya Sunil Mahabaleshwarkar, Andrew Tao, Anna Shors, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Bobby Chen, Boris Ginsburg, Boxin Wang, Brandon Norick, Brian Butterfield, Bryan Catanzaro, Carlo del Mundo , et al. (176 additional authors not shown)

    Abstract: As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transf… ▽ More

    Submitted 5 September, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

  8. arXiv:2410.07524  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Upcycling Large Language Models into Mixture of Experts

    Authors: Ethan He, Abhinav Khattar, Ryan Prenger, Vijay Korthikanti, Zijie Yan, Tong Liu, Shiqing Fan, Ashwath Aithal, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: Upcycling pre-trained dense language models into sparse mixture-of-experts (MoE) models is an efficient approach to increase the model capacity of already trained models. However, optimal techniques for upcycling at scale remain unclear. In this work, we conduct an extensive study of upcycling methods and hyperparameters for billion-parameter scale language models. We propose a novel "virtual grou… ▽ More

    Submitted 15 June, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

  9. arXiv:2409.00829  [pdf, other

    cs.CV cs.CG cs.GR

    Curvy: A Parametric Cross-section based Surface Reconstruction

    Authors: Aradhya N. Mathur, Apoorv Khattar, Ojaswa Sharma

    Abstract: In this work, we present a novel approach for reconstructing shape point clouds using planar sparse cross-sections with the help of generative modeling. We present unique challenges pertaining to the representation and reconstruction in this problem setting. Most methods in the classical literature lack the ability to generalize based on object class and employ complex mathematical machinery to re… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  10. arXiv:2401.12724  [pdf, other

    cs.GR

    A Multi-scale Yarn Appearance Model with Fiber Details

    Authors: Apoorv Khattar, Junqui Zhu, Emiliano Padovani, Jean-Marie Aurby, Marc Droske, Ling-Qi Yan, Zahra Montazeri

    Abstract: Rendering realistic cloth has always been a challenge due to its intricate structure. Cloth is made up of fibers, plies, and yarns, and previous curved-based models, while detailed, were computationally expensive and inflexible for large cloth. To address this, we propose a simplified approach. We introduce a geometric aggregation technique that reduces ray-tracing computation by using fewer curve… ▽ More

    Submitted 18 March, 2025; v1 submitted 23 January, 2024; originally announced January 2024.

  11. arXiv:2104.00107  [pdf, other

    cs.CV cs.CL cs.LG

    Analysis on Image Set Visual Question Answering

    Authors: Abhinav Khattar, Aviral Joshi, Har Simrat Singh, Pulkit Goel, Rohit Prakash Barnwal

    Abstract: We tackle the challenge of Visual Question Answering in multi-image setting for the ISVQA dataset. Traditional VQA tasks have focused on a single-image setting where the target answer is generated from a single image. Image set VQA, however, comprises of a set of images and requires finding connection between images, relate the objects across images based on these connections and generate a unifie… ▽ More

    Submitted 31 March, 2021; originally announced April 2021.

  12. arXiv:2004.11702  [pdf, other

    eess.IV cs.GR

    Multimodal Medical Volume Colorization from 2D Style

    Authors: Aradhya Neeraj Mathur, Apoorv Khattar, Ojaswa Sharma

    Abstract: Colorization involves the synthesis of colors on a target image while preserving structural content as well as the semantics of the target image. This is a well-explored problem in 2D with many state-of-the-art solutions. We propose a novel deep learning-based approach for the colorization of 3D medical volumes. Our system is capable of directly mapping the colors of a 2D photograph to a 3D MRI vo… ▽ More

    Submitted 6 April, 2020; originally announced April 2020.

  13. arXiv:1903.04879  [pdf, other

    cs.SI cs.CY cs.LG

    What sets Verified Users apart? Insights, Analysis and Prediction of Verified Users on Twitter

    Authors: Indraneil Paul, Abhinav Khattar, Shaan Chopra, Ponnurangam Kumaraguru, Manish Gupta

    Abstract: Social network and publishing platforms, such as Twitter, support the concept of a secret proprietary verification process, for handles they deem worthy of platform-wide public interest. In line with significant prior work which suggests that possessing such a status symbolizes enhanced credibility in the eyes of the platform audience, a verified badge is clearly coveted among public figures and b… ▽ More

    Submitted 12 March, 2019; originally announced March 2019.

  14. arXiv:1812.09710  [pdf, other

    cs.SI cs.CY

    Elites Tweet? Characterizing the Twitter Verified User Network

    Authors: Indraneil Paul, Abhinav Khattar, Ponnurangam Kumaraguru, Manish Gupta, Shaan Chopra

    Abstract: Social network and publishing platforms, such as Twitter, support the concept of verification. Verified accounts are deemed worthy of platform-wide public interest and are separately authenticated by the platform itself. There have been repeated assertions by these platforms about verification not being tantamount to endorsement. However, a significant body of prior work suggests that possessing a… ▽ More

    Submitted 12 March, 2019; v1 submitted 23 December, 2018; originally announced December 2018.

  15. arXiv:1802.04168  [pdf, other

    cs.SI

    Collective Classification of Spam Campaigners on Twitter: A Hierarchical Meta-Path Based Approach

    Authors: Srishti Gupta, Abhinav Khattar, Arpit Gogia, Ponnurangam Kumaraguru, Tanmoy Chakraborty

    Abstract: Cybercriminals have leveraged the popularity of a large user base available on Online Social Networks to spread spam campaigns by propagating phishing URLs, attaching malicious contents, etc. However, another kind of spam attacks using phone numbers has recently become prevalent on OSNs, where spammers advertise phone numbers to attract users' attention and convince them to make a call to these ph… ▽ More

    Submitted 12 February, 2018; originally announced February 2018.

    Comments: To appear in WWW 2018

  16. arXiv:1801.05588  [pdf, other

    cs.SI

    White or Blue, the Whale gets its Vengeance: A Social Media Analysis of the Blue Whale Challenge

    Authors: Abhinav Khattar, Karan Dabas, Kshitij Gupta, Shaan Chopra, Ponnurangam Kumaraguru

    Abstract: The Blue Whale Challenge is a series of self-harm causing tasks that are propagated via online social media under the disguise of a "game." The list of tasks must be completed in a duration of 50 days and they cause both physical and mental harm to the player. The final task is to commit suicide. The game is supposed to be administered by people called "curators" who incite others to cause self-mu… ▽ More

    Submitted 17 January, 2018; originally announced January 2018.

    Comments: 18 pages