-
World Simulation with Video Foundation Models for Physical AI
Authors:
NVIDIA,
:,
Arslan Ali,
Junjie Bai,
Maciej Bala,
Yogesh Balaji,
Aaron Blakeman,
Tiffany Cai,
Jiaxin Cao,
Tianshi Cao,
Elizabeth Cha,
Yu-Wei Chao,
Prithvijit Chattopadhyay,
Mike Chen,
Yongxin Chen,
Yu Chen,
Shuai Cheng,
Yin Cui,
Jenna Diamond,
Yifan Ding,
Jiaojiao Fan,
Linxi Fan,
Liang Feng,
Francesco Ferroni,
Sanja Fidler
, et al. (65 additional authors not shown)
Abstract:
We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI. Built on a flow-based architecture, [Cosmos-Predict2.5] unifies Text2World, Image2World, and Video2World generation in a single model and leverages [Cosmos-Reason1], a Physical AI vision-language model, to provide richer text grounding and finer control of world simulation. Trained on 200…
▽ More
We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI. Built on a flow-based architecture, [Cosmos-Predict2.5] unifies Text2World, Image2World, and Video2World generation in a single model and leverages [Cosmos-Reason1], a Physical AI vision-language model, to provide richer text grounding and finer control of world simulation. Trained on 200M curated video clips and refined with reinforcement learning-based post-training, [Cosmos-Predict2.5] achieves substantial improvements over [Cosmos-Predict1] in video quality and instruction alignment, with models released at 2B and 14B scales. These capabilities enable more reliable synthetic data generation, policy evaluation, and closed-loop simulation for robotics and autonomous systems. We further extend the family with [Cosmos-Transfer2.5], a control-net style framework for Sim2Real and Real2Real world translation. Despite being 3.5$\times$ smaller than [Cosmos-Transfer1], it delivers higher fidelity and robust long-horizon video generation. Together, these advances establish [Cosmos-Predict2.5] and [Cosmos-Transfer2.5] as versatile tools for scaling embodied intelligence. To accelerate research and deployment in Physical AI, we release source code, pretrained checkpoints, and curated benchmarks under the NVIDIA Open Model License at https://github.com/nvidia-cosmos/cosmos-predict2.5 and https://github.com/nvidia-cosmos/cosmos-transfer2.5. We hope these open resources lower the barrier to adoption and foster innovation in building the next generation of embodied intelligence.
△ Less
Submitted 24 February, 2026; v1 submitted 28 October, 2025;
originally announced November 2025.
-
HausaNLP at SemEval-2025 Task 11: Hausa Text Emotion Detection
Authors:
Sani Abdullahi Sani,
Salim Abubakar,
Falalu Ibrahim Lawan,
Abdulhamid Abubakar,
Maryam Bala
Abstract:
This paper presents our approach to multi-label emotion detection in Hausa, a low-resource African language, for SemEval Track A. We fine-tuned AfriBERTa, a transformer-based model pre-trained on African languages, to classify Hausa text into six emotions: anger, disgust, fear, joy, sadness, and surprise. Our methodology involved data preprocessing, tokenization, and model fine-tuning using the Hu…
▽ More
This paper presents our approach to multi-label emotion detection in Hausa, a low-resource African language, for SemEval Track A. We fine-tuned AfriBERTa, a transformer-based model pre-trained on African languages, to classify Hausa text into six emotions: anger, disgust, fear, joy, sadness, and surprise. Our methodology involved data preprocessing, tokenization, and model fine-tuning using the Hugging Face Trainer API. The system achieved a validation accuracy of 74.00%, with an F1-score of 73.50%, demonstrating the effectiveness of transformer-based models for emotion detection in low-resource languages.
△ Less
Submitted 23 June, 2025; v1 submitted 19 June, 2025;
originally announced June 2025.
-
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Authors:
NVIDIA,
:,
Aaron Blakeman,
Aarti Basant,
Abhinav Khattar,
Adithya Renduchintala,
Akhiad Bercovich,
Aleksander Ficek,
Alexis Bjorlin,
Ali Taghibakhshi,
Amala Sanjay Deshmukh,
Ameya Sunil Mahabaleshwarkar,
Andrew Tao,
Anna Shors,
Ashwath Aithal,
Ashwin Poojary,
Ayush Dattagupta,
Balaram Buddharaju,
Bobby Chen,
Boris Ginsburg,
Boxin Wang,
Brandon Norick,
Brian Butterfield,
Bryan Catanzaro,
Carlo del Mundo
, et al. (176 additional authors not shown)
Abstract:
As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transf…
▽ More
As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transformer model architecture with Mamba layers that perform constant computation and require constant memory per generated token. We show that Nemotron-H models offer either better or on-par accuracy compared to other similarly-sized state-of-the-art open-sourced Transformer models (e.g., Qwen-2.5-7B/72B and Llama-3.1-8B/70B), while being up to 3$\times$ faster at inference. To further increase inference speed and reduce the memory required at inference time, we created Nemotron-H-47B-Base from the 56B model using a new compression via pruning and distillation technique called MiniPuzzle. Nemotron-H-47B-Base achieves similar accuracy to the 56B model, but is 20% faster to infer. In addition, we introduce an FP8-based training recipe and show that it can achieve on par results with BF16-based training. This recipe is used to train the 56B model. We are releasing Nemotron-H base model checkpoints with support in Hugging Face and NeMo.
△ Less
Submitted 5 September, 2025; v1 submitted 4 April, 2025;
originally announced April 2025.
-
HausaNLP at SemEval-2025 Task 2: Entity-Aware Fine-tuning vs. Prompt Engineering in Entity-Aware Machine Translation
Authors:
Abdulhamid Abubakar,
Hamidatu Abdulkadir,
Ibrahim Rabiu Abdullahi,
Abubakar Auwal Khalid,
Ahmad Mustapha Wali,
Amina Aminu Umar,
Maryam Bala,
Sani Abdullahi Sani,
Ibrahim Said Ahmad,
Shamsuddeen Hassan Muhammad,
Idris Abdulmumin,
Vukosi Marivate
Abstract:
This paper presents our findings for SemEval 2025 Task 2, a shared task on entity-aware machine translation (EA-MT). The goal of this task is to develop translation models that can accurately translate English sentences into target languages, with a particular focus on handling named entities, which often pose challenges for MT systems. The task covers 10 target languages with English as the sourc…
▽ More
This paper presents our findings for SemEval 2025 Task 2, a shared task on entity-aware machine translation (EA-MT). The goal of this task is to develop translation models that can accurately translate English sentences into target languages, with a particular focus on handling named entities, which often pose challenges for MT systems. The task covers 10 target languages with English as the source. In this paper, we describe the different systems we employed, detail our results, and discuss insights gained from our experiments.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
HausaNLP at SemEval-2025 Task 3: Towards a Fine-Grained Model-Aware Hallucination Detection
Authors:
Maryam Bala,
Amina Imam Abubakar,
Abdulhamid Abubakar,
Abdulkadir Shehu Bichi,
Hafsa Kabir Ahmad,
Sani Abdullahi Sani,
Idris Abdulmumin,
Shamsuddeen Hassan Muhamad,
Ibrahim Said Ahmad
Abstract:
This paper presents our findings of the Multilingual Shared Task on Hallucinations and Related Observable Overgeneration Mistakes, MU-SHROOM, which focuses on identifying hallucinations and related overgeneration errors in large language models (LLMs). The shared task involves detecting specific text spans that constitute hallucinations in the outputs generated by LLMs in 14 languages. To address…
▽ More
This paper presents our findings of the Multilingual Shared Task on Hallucinations and Related Observable Overgeneration Mistakes, MU-SHROOM, which focuses on identifying hallucinations and related overgeneration errors in large language models (LLMs). The shared task involves detecting specific text spans that constitute hallucinations in the outputs generated by LLMs in 14 languages. To address this task, we aim to provide a nuanced, model-aware understanding of hallucination occurrences and severity in English. We used natural language inference and fine-tuned a ModernBERT model using a synthetic dataset of 400 samples, achieving an Intersection over Union (IoU) score of 0.032 and a correlation score of 0.422. These results indicate a moderately positive correlation between the model's confidence scores and the actual presence of hallucinations. The IoU score indicates that our model has a relatively low overlap between the predicted hallucination span and the truth annotation. The performance is unsurprising, given the intricate nature of hallucination detection. Hallucinations often manifest subtly, relying on context, making pinpointing their exact boundaries formidable.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
Authors:
NVIDIA,
:,
Hassan Abu Alhaija,
Jose Alvarez,
Maciej Bala,
Tiffany Cai,
Tianshi Cao,
Liz Cha,
Joshua Chen,
Mike Chen,
Francesco Ferroni,
Sanja Fidler,
Dieter Fox,
Yunhao Ge,
Jinwei Gu,
Ali Hassani,
Michael Isaev,
Pooya Jannaty,
Shiyi Lan,
Tobias Lasser,
Huan Ling,
Ming-Yu Liu,
Xian Liu,
Yifan Lu,
Alice Luo
, et al. (16 additional authors not shown)
Abstract:
We introduce Cosmos-Transfer, a conditional world generation model that can generate world simulations based on multiple spatial control inputs of various modalities such as segmentation, depth, and edge. In the design, the spatial conditional scheme is adaptive and customizable. It allows weighting different conditional inputs differently at different spatial locations. This enables highly contro…
▽ More
We introduce Cosmos-Transfer, a conditional world generation model that can generate world simulations based on multiple spatial control inputs of various modalities such as segmentation, depth, and edge. In the design, the spatial conditional scheme is adaptive and customizable. It allows weighting different conditional inputs differently at different spatial locations. This enables highly controllable world generation and finds use in various world-to-world transfer use cases, including Sim2Real. We conduct extensive evaluations to analyze the proposed model and demonstrate its applications for Physical AI, including robotics Sim2Real and autonomous vehicle data enrichment. We further demonstrate an inference scaling strategy to achieve real-time world generation with an NVIDIA GB200 NVL72 rack. To help accelerate research development in the field, we open-source our models and code at https://github.com/nvidia-cosmos/cosmos-transfer1.
△ Less
Submitted 1 April, 2025; v1 submitted 18 March, 2025;
originally announced March 2025.
-
Cosmos World Foundation Model Platform for Physical AI
Authors:
NVIDIA,
:,
Niket Agarwal,
Arslan Ali,
Maciej Bala,
Yogesh Balaji,
Erik Barker,
Tiffany Cai,
Prithvijit Chattopadhyay,
Yongxin Chen,
Yin Cui,
Yifan Ding,
Daniel Dworakowski,
Jiaojiao Fan,
Michele Fenzi,
Francesco Ferroni,
Sanja Fidler,
Dieter Fox,
Songwei Ge,
Yunhao Ge,
Jinwei Gu,
Siddharth Gururani,
Ethan He,
Jiahui Huang,
Jacob Huffman
, et al. (54 additional authors not shown)
Abstract:
Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into cu…
▽ More
Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications. Our platform covers a video curation pipeline, pre-trained world foundation models, examples of post-training of pre-trained world foundation models, and video tokenizers. To help Physical AI builders solve the most critical problems of our society, we make Cosmos open-source and our models open-weight with permissive licenses available via https://github.com/nvidia-cosmos/cosmos-predict1.
△ Less
Submitted 9 July, 2025; v1 submitted 7 January, 2025;
originally announced January 2025.
-
Edify 3D: Scalable High-Quality 3D Asset Generation
Authors:
NVIDIA,
:,
Maciej Bala,
Yin Cui,
Yifan Ding,
Yunhao Ge,
Zekun Hao,
Jon Hasselgren,
Jacob Huffman,
Jingyi Jin,
J. P. Lewis,
Zhaoshuo Li,
Chen-Hsuan Lin,
Yen-Chen Lin,
Tsung-Yi Lin,
Ming-Yu Liu,
Alice Luo,
Qianli Ma,
Jacob Munkberg,
Stella Shi,
Fangyin Wei,
Donglai Xiang,
Jiashu Xu,
Xiaohui Zeng,
Qinsheng Zhang
Abstract:
We introduce Edify 3D, an advanced solution designed for high-quality 3D asset generation. Our method first synthesizes RGB and surface normal images of the described object at multiple viewpoints using a diffusion model. The multi-view observations are then used to reconstruct the shape, texture, and PBR materials of the object. Our method can generate high-quality 3D assets with detailed geometr…
▽ More
We introduce Edify 3D, an advanced solution designed for high-quality 3D asset generation. Our method first synthesizes RGB and surface normal images of the described object at multiple viewpoints using a diffusion model. The multi-view observations are then used to reconstruct the shape, texture, and PBR materials of the object. Our method can generate high-quality 3D assets with detailed geometry, clean shape topologies, high-resolution textures, and materials within 2 minutes of runtime.
△ Less
Submitted 11 November, 2024;
originally announced November 2024.
-
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models
Authors:
NVIDIA,
:,
Yuval Atzmon,
Maciej Bala,
Yogesh Balaji,
Tiffany Cai,
Yin Cui,
Jiaojiao Fan,
Yunhao Ge,
Siddharth Gururani,
Jacob Huffman,
Ronald Isaac,
Pooya Jannaty,
Tero Karras,
Grace Lam,
J. P. Lewis,
Aaron Licata,
Yen-Chen Lin,
Ming-Yu Liu,
Qianli Ma,
Arun Mallya,
Ashlee Martino-Tarr,
Doug Mendez,
Seungjun Nah,
Chris Pruett
, et al. (7 additional authors not shown)
Abstract:
We introduce Edify Image, a family of diffusion models capable of generating photorealistic image content with pixel-perfect accuracy. Edify Image utilizes cascaded pixel-space diffusion models trained using a novel Laplacian diffusion process, in which image signals at different frequency bands are attenuated at varying rates. Edify Image supports a wide range of applications, including text-to-i…
▽ More
We introduce Edify Image, a family of diffusion models capable of generating photorealistic image content with pixel-perfect accuracy. Edify Image utilizes cascaded pixel-space diffusion models trained using a novel Laplacian diffusion process, in which image signals at different frequency bands are attenuated at varying rates. Edify Image supports a wide range of applications, including text-to-image synthesis, 4K upsampling, ControlNets, 360 HDR panorama generation, and finetuning for image customization.
△ Less
Submitted 11 November, 2024;
originally announced November 2024.
-
A Survey on Open-Source-Defined Wireless Networks: Framework, Key Technology, and Implementation
Authors:
Liqiang Zhao,
Muhammad Muhammad Bala,
Wu Gang,
Pan Chengkang,
Yuan Yannan,
Tian Zhigang,
Yu-Chee Tseng,
Chen Xiang,
Bin Shen,
Chih-Lin I
Abstract:
The realization of open-source-defined wireless networks in the telecommunication domain is accomplished through the fifth-generation network (5G). In contrast to its predecessors (3G and 4G), the 5G network can support a wide variety of heterogeneous use cases with challenging requirements from both the Internet and the Internet of Things (IoT). The future sixth-generation (6G) network will not o…
▽ More
The realization of open-source-defined wireless networks in the telecommunication domain is accomplished through the fifth-generation network (5G). In contrast to its predecessors (3G and 4G), the 5G network can support a wide variety of heterogeneous use cases with challenging requirements from both the Internet and the Internet of Things (IoT). The future sixth-generation (6G) network will not only extend 5G capabilities but also innovate new functionalities to address emerging academic and engineering challenges. The research community has identified these challenges could be overcome by open-source-defined wireless networks, which is based on open-source software and hardware. In this survey, we present an overview of different aspects of open-source-defined wireless networks, comprising motivation, frameworks, key technologies, and implementation. We start by introducing the motivation and explore several frameworks with classification into three different categories: black-box, grey-box, and white-box. We review research efforts related to open-source-defined Core Network (CN), Radio Access Network (RAN), Multi-access Edge Computing (MEC), the capabilities of security threats, open-source hardware, and various implementations, including testbeds. The last but most important in this survey, lessons learned, future research direction, open research issues, pitfalls, and limitations of existing surveys on open-source wireless networks are included to motivate and encourage future research.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
New Failure Rate Model for Iterative Software Development Life Cycle Process
Authors:
Sangeeta,
Kapil Sharma,
Manju Bala
Abstract:
Software reliability models are one of the most generally used mathematical tool for estimation of reliability, failure rate and number of remaining faults in the software. Existing software reliability models are designed to follow waterfall software development life cycle process. These existing models do not take advantage of iterative software development process. In this paper, a new failure…
▽ More
Software reliability models are one of the most generally used mathematical tool for estimation of reliability, failure rate and number of remaining faults in the software. Existing software reliability models are designed to follow waterfall software development life cycle process. These existing models do not take advantage of iterative software development process. In this paper, a new failure rate model centered on iterative software development life cycle process has been developed. It aims to integrate a new modulation factor for incorporating varying needs in each phase of iterative software development process. It comprises imperfect debugging with the possibility of fault introduction and removal of multiple faults in an interval as iterative development of the software proceeds. The proposed model has been validated on twelve iterations of Eclipse software failure dataset and nine iterations of Java Development toolkit (JDT) software failure dataset. Parameter estimation for the proposed model has been done by hybrid Particle Swarm Optimization and Gravitational Search Algorithm. Experimental results in-terms of goodness-of-fit shows that proposed model has outperformed Jelinski Moranda, Shick Wolverton, Goel Okummotto Imperfect debugging, GS Mahapatra, Modified Shick Wolverton in 83.33 % of iterations for eclipse dataset and 77.77% of iterations for JDT dataset.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
Military Dog Based Optimizer and its Application to Fake Review
Authors:
Ashish Kumar Tripathi,
Kapil Sharma,
Manju Bala
Abstract:
Over the last three decades more then sixty meta-heuristic algorithms have been proposed by the various authors. Such algorithms are inspired from physical phenomena, animal behavior or evolutionary concepts. These algorithms have been widely used for solving the various real world optimization problems. Researchers are continuously working to improve the existing algorithms and also proposing new…
▽ More
Over the last three decades more then sixty meta-heuristic algorithms have been proposed by the various authors. Such algorithms are inspired from physical phenomena, animal behavior or evolutionary concepts. These algorithms have been widely used for solving the various real world optimization problems. Researchers are continuously working to improve the existing algorithms and also proposing new algorithms that are giving competitive results as compared to the existing algorithms present in the literature. In this paper a novel meta heuristic algorithm based on military dogs squad is introduced. The proposed algorithm mimics the searching capability of the trained military dogs. Military dogs have strong smell senses by which they are able to search the suspicious objects like bombs, wildlife scats, currency, or blood as well as they can communicate with each other by their barking. The performance of the proposed algorithm is tested on 17 benchmark functions and compared with five other meta-heuristics namely particle swarm optimization (PSO), multiverse optimizer (MVO), genetic algorithm (GA), probability based learning (PBIL) and evolutionary strategy (ES). The results are validated in terms of mean and standard deviation of the fitness value. The convergence behavior and consistency of the results have been also validated by plotting convergence graphs and BoxPlots. Further the, proposed algorithm is successfully utilized to solve the real world fake review detection problem. The experimental results demonstrate that the proposed algorithm outperforms the other considered algorithms on the majority of performance parameters.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Intelligent Agent for Prediction in E- Negotiation: An Approach
Authors:
Mohammad Irfan Bala,
Sheetal Vij,
Debajyoti Mukhopadhyay
Abstract:
With the proliferation of web technologies it becomes more and more important to make the traditional negotiation pricing mechanism automated and intelligent. The behaviour of software agents which negotiate on behalf of humans is determined by their tactics in the form of decision functions. Prediction of partners behaviour in negotiation has been an active research direction in recent years as i…
▽ More
With the proliferation of web technologies it becomes more and more important to make the traditional negotiation pricing mechanism automated and intelligent. The behaviour of software agents which negotiate on behalf of humans is determined by their tactics in the form of decision functions. Prediction of partners behaviour in negotiation has been an active research direction in recent years as it will improve the utility gain for the adaptive negotiation agent and also achieve the agreement much quicker or look after much higher benefits. In this paper we review the various negotiation methods and the existing architecture. Although negotiation is practically very complex activity to automate without human intervention we have proposed architecture for predicting the opponents behaviour which will take into consideration various factors which affect the process of negotiation. The basic concept is that the information about negotiators, their individual actions and dynamics can be used by software agents equipped with adaptive capabilities to learn from past negotiations and assist in selecting appropriate negotiation tactics.
△ Less
Submitted 25 November, 2013;
originally announced November 2013.