8000
Skip to content
View RohollahHS's full-sized avatar

Block or report RohollahHS

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results
Python 51 3 Updated Feb 12, 2026

A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.

76 7 Updated Mar 18, 2025

Official code and dataset for our ICCV 2025 paper: MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models

10 Updated Oct 21, 2025

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

Python 7,309 549 Updated May 5, 2025

The official implementation of “ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought”

Python 49 4 Updated Feb 2, 2026

The official implementation of "DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation". (arXiv 2601.22153)

197 5 Updated Jan 30, 2026

VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning.

Python 27 Updated Feb 20, 2026

[ICLR 2026] Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation

Python 127 2 Updated Feb 15, 2026

Visual Causal Flow

Python 2,710 226 Updated Feb 3, 2026

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

Python 1,148 153 Updated Sep 9, 2025

RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI

Python 3,128 405 Updated Apr 17, 2026

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

Python 118 10 Updated Apr 1, 2026

Towards Efficient Multimodal Large Language Models: A Survey on Token Compression

160 4 Updated Apr 16, 2026

Convert any PDF into a podcast episode!

Python 2,580 280 Updated Dec 7, 2024

[CVPR 2026] Official codes of "Monet: Reasoning in Latent Visual Space Beyond Image and Language"

Python 166 2 Updated Mar 19, 2026

Agent0 Series: Self-Evolving Agents from Zero Data

Python 1,164 138 Updated Feb 17, 2026

Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"

Python 344 18 Updated Apr 17, 2026

[CVPR 2026 (Findings) 🔥🔥] Self Evolving Large Multimodal Models with Continuous Rewards

Python 22 2 Updated Mar 5, 2026

Official implementation of "Continuous Autoregressive Language Models"

Python 799 92 Updated Feb 11, 2026

Official codebase for the paper Latent Visual Reasoning

Python 146 7 Updated Oct 22, 2025
Python 1,196 73 Updated Nov 20, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 18,988 1,735 Updated Jan 30, 2026

Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"

Python 129 9 Updated Feb 4, 2026

✨ [ICLR'26] WithAnyone is capable of generating high-quality, controllable, and ID consistent images

Python 563 21 Updated Mar 21, 2026

Contexts Optical Compression

Python 22,844 2,105 Updated Jan 27, 2026

Official implementation of Spatial-Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model [ICLR2026]

Python 219 12 Updated Mar 12, 2026
Jupyter Notebook 1,778 112 Updated Nov 5, 2025

Official repository for OmniVLA training and inference code

Python 229 41 Updated Mar 25, 2026

LLM101n: Let's build a Storyteller

36,799 2,011 Updated Aug 1, 2024
Next
0