SonoWorld: From One Image to a 3D Audio-Visual Scene

Authors: Derong Jin, Xiyi Chen, Ming C. Lin, Ruohan Gao

Abstract: Tremendous progress in visual scene generation now turns a single image into an explorable 3D world, yet immersion remains incomplete without sound. We introduce Image2AVScene, the task of generating a 3D audio-visual scene from a single image, and present SonoWorld, the first framework to tackle this challenge. From one image, our pipeline outpaints a 360° panorama, lifts it into a navigable 3D s… ▽ More Tremendous progress in visual scene generation now turns a single image into an explorable 3D world, yet immersion remains incomplete without sound. We introduce Image2AVScene, the task of generating a 3D audio-visual scene from a single image, and present SonoWorld, the first framework to tackle this challenge. From one image, our pipeline outpaints a 360° panorama, lifts it into a navigable 3D scene, places language-guided sound anchors, and renders ambisonics for point, areal, and ambient sources, yielding spatial audio aligned with scene geometry and semantics. Quantitative evaluations on a newly curated real-world dataset and a controlled user study confirm the effectiveness of our approach. Beyond free-viewpoint audio-visual rendering, we also demonstrate applications to one-shot acoustic learning and audio-visual spatial source separation. Project website: https://humathe.github.io/sonoworld/ △ Less

Submitted 30 March, 2026; originally announced March 2026.

Comments: Accepted by CVPR 2026, project page: https://humathe.github.io/sonoworld/

arXiv:2603.06533 [pdf, ps, other]

NEGATE: Constrained Semantic Guidance for Linguistic Negation in Text-to-Video Diffusion

Authors: Taewon Kang, Ming C. Lin

Abstract: Negation is a fundamental linguistic operator, yet it remains inadequately modeled in diffusion-based generative systems. In this work, we present a formal treatment of linguistic negation in diffusion-based generative models by modeling it as a structured feasibility constraint on semantic guidance within diffusion dynamics. Rather than introducing heuristics or retraining model parameters, we re… ▽ More Negation is a fundamental linguistic operator, yet it remains inadequately modeled in diffusion-based generative systems. In this work, we present a formal treatment of linguistic negation in diffusion-based generative models by modeling it as a structured feasibility constraint on semantic guidance within diffusion dynamics. Rather than introducing heuristics or retraining model parameters, we reinterpret classifier-free guidance as defining a semantic update direction and enforce negation by projecting the update onto a convex constraint set derived from linguistic structure. This novel formulation provides a unified framework for handling diverse negation phenomena, including object absence, graded non-inversion semantics, multi-negation composition, and scope-sensitive disambiguation. Our approach is training-free, compatible with pretrained diffusion backbones, and naturally extends from image generation to temporally evolving video trajectories. In addition, we introduce a structured negation-centric benchmark suite that isolates distinct linguistic failure modes in generative systems, to further research in this area. Experiments demonstrate that our method achieves robust negation compliance while preserving visual fidelity and structural coherence, establishing the first unified formulation of linguistic negation in diffusion-based generative models beyond representation-level evaluation. △ Less

Submitted 6 March, 2026; originally announced March 2026.

Comments: 50 pages, 32 figures

arXiv:2602.19035 [pdf, ps, other]

OpenVO: Open-World Visual Odometry with Temporal Dynamics Awareness

Authors: Phuc D. A. Nguyen, Anh N. Nhu, Ming C. Lin

Abstract: We introduce OpenVO, a novel framework for Open-world Visual Odometry (VO) with temporal awareness under limited input conditions. OpenVO effectively estimates real-world-scale ego-motion from monocular dashcam footage with varying observation rates and uncalibrated cameras, enabling robust trajectory dataset construction from rare driving events recorded in dashcam. Existing VO methods are traine… ▽ More We introduce OpenVO, a novel framework for Open-world Visual Odometry (VO) with temporal awareness under limited input conditions. OpenVO effectively estimates real-world-scale ego-motion from monocular dashcam footage with varying observation rates and uncalibrated cameras, enabling robust trajectory dataset construction from rare driving events recorded in dashcam. Existing VO methods are trained on fixed observation frequency (e.g., 10Hz or 12Hz), completely overlooking temporal dynamics information. Many prior methods also require calibrated cameras with known intrinsic parameters. Consequently, their performance degrades when (1) deployed under unseen observation frequencies or (2) applied to uncalibrated cameras. These significantly limit their generalizability to many downstream tasks, such as extracting trajectories from dashcam footage. To address these challenges, OpenVO (1) explicitly encodes temporal dynamics information within a two-frame pose regression framework and (2) leverages 3D geometric priors derived from foundation models. We validate our method on three major autonomous-driving benchmarks - KITTI, nuScenes, and Argoverse 2 - achieving more than 20 performance improvement over state-of-the-art approaches. Under varying observation rate settings, our method is significantly more robust, achieving 46%-92% lower errors across all metrics. These results demonstrate the versatility of OpenVO for real-world 3D reconstruction and diverse downstream applications. △ Less

Submitted 21 February, 2026; originally announced February 2026.

Comments: Main paper CVPR 2026

arXiv:2512.17151 [pdf, ps, other]

Text-Conditioned Background Generation for Editable Multi-Layer Documents

Authors: Taewon Kang, Joseph K J, Chris Tensmeyer, Jihyung Kil, Wanrong Zhu, Ming C. Lin, Vlad I. Morariu

Abstract: We present a framework for document-centric background generation with multi-page editing and thematic continuity. To ensure text regions remain readable, we employ a \emph{latent masking} formulation that softly attenuates updates in the diffusion space, inspired by smooth barrier functions in physics and numerical optimization. In addition, we introduce \emph{Automated Readability Optimization (… ▽ More We present a framework for document-centric background generation with multi-page editing and thematic continuity. To ensure text regions remain readable, we employ a \emph{latent masking} formulation that softly attenuates updates in the diffusion space, inspired by smooth barrier functions in physics and numerical optimization. In addition, we introduce \emph{Automated Readability Optimization (ARO)}, which automatically places semi-transparent, rounded backing shapes behind text regions. ARO determines the minimal opacity needed to satisfy perceptual contrast standards (WCAG 2.2) relative to the underlying background, ensuring readability while maintaining aesthetic harmony without human intervention. Multi-page consistency is maintained through a summarization-and-instruction process, where each page is distilled into a compact representation that recursively guides subsequent generations. This design reflects how humans build continuity by retaining prior context, ensuring that visual motifs evolve coherently across an entire document. Our method further treats a document as a structured composition in which text, figures, and backgrounds are preserved or regenerated as separate layers, allowing targeted background editing without compromising readability. Finally, user-provided prompts allow stylistic adjustments in color and texture, balancing automated consistency with flexible customization. Our training-free framework produces visually coherent, text-preserving, and thematically aligned documents, bridging generative modeling with natural design workflows. △ Less

Submitted 18 December, 2025; originally announced December 2025.

arXiv:2512.06818 [pdf, ps, other]

MeshSplatting: Differentiable Rendering with Opaque Meshes

Authors: Jan Held, Sanghyun Son, Renaud Vandeghen, Daniel Rebain, Matheus Gadelha, Yi Zhou, Anthony Cioppa, Ming C. Lin, Marc Van Droogenbroeck, Andrea Tagliasacchi

Abstract: Primitive-based splatting methods like 3D Gaussian Splatting have revolutionized novel view synthesis with real-time rendering. However, their point-based representations remain incompatible with mesh-based pipelines that power AR/VR and game engines. We present MeshSplatting, a mesh-based reconstruction approach that jointly optimizes geometry and appearance through differentiable rendering. By e… ▽ More Primitive-based splatting methods like 3D Gaussian Splatting have revolutionized novel view synthesis with real-time rendering. However, their point-based representations remain incompatible with mesh-based pipelines that power AR/VR and game engines. We present MeshSplatting, a mesh-based reconstruction approach that jointly optimizes geometry and appearance through differentiable rendering. By enforcing connectivity via restricted Delaunay triangulation and refining surface consistency, MeshSplatting creates end-to-end smooth, visually high-quality meshes that render efficiently in real-time 3D engines. On Mip-NeRF360, it boosts PSNR by +0.69 dB over the current state-of-the-art MiLo for mesh-based novel view synthesis, while training 2x faster and using 2x less memory, bridging neural rendering and interactive 3D graphics for seamless real-time scene interaction. The project page is available at https://meshsplatting.github.io/. △ Less

Submitted 7 December, 2025; originally announced December 2025.

arXiv:2511.01259 [pdf, ps, other]

doi 10.1145/3757377.3763903

An Adjoint Method for Differentiable Fluid Simulation on Flow Maps

Authors: Zhiqi Li, Jinjin He, Barnabás Börcsök, Taiyuan Zhang, Duowen Chen, Tao Du, Ming C. Lin, Greg Turk, Bo Zhu

Abstract: This paper presents a novel adjoint solver for differentiable fluid simulation based on bidirectional flow maps. Our key observation is that the forward fluid solver and its corresponding backward, adjoint solver share the same flow map as the forward simulation. In the forward pass, this map transports fluid impulse variables from the initial frame to the current frame to simulate vortical dynami… ▽ More This paper presents a novel adjoint solver for differentiable fluid simulation based on bidirectional flow maps. Our key observation is that the forward fluid solver and its corresponding backward, adjoint solver share the same flow map as the forward simulation. In the forward pass, this map transports fluid impulse variables from the initial frame to the current frame to simulate vortical dynamics. In the backward pass, the same map propagates adjoint variables from the current frame back to the initial frame to compute gradients. This shared long-range map allows the accuracy of gradient computation to benefit directly from improvements in flow map construction. Building on this insight, we introduce a novel adjoint solver that solves the adjoint equations directly on the flow map, enabling long-range and accurate differentiation of incompressible flows without differentiating intermediate numerical steps or storing intermediate variables, as required in conventional adjoint methods. To further improve efficiency, we propose a long-short time-sparse flow map representation for evolving adjoint variables. Our approach has low memory usage, requiring only 6.53GB of data at a resolution of $192^3$ while preserving high accuracy in tracking vorticity, enabling new differentiable simulation tasks that require precise identification, prediction, and control of vortex dynamics. △ Less

Submitted 3 November, 2025; originally announced November 2025.

Comments: 15 pages, 16 figures

Journal ref: ACM SIGGRAPH Asia Conference Proceedings (2025)

arXiv:2510.17665 [pdf, ps, other]

Efficient recognition algorithms for $(1,2)$-, $(2,1)$- and $(2,2)$-graphs

Authors: Flavia Bonomo-Braberman, Min Chih Lin, Ignacio Maqueda

Abstract: A graph $G$ is said to be a $(k,\ell)$-graph if its vertex set can be partitioned into $k$ independent sets and $\ell$ cliques. It is well established that the recognition problem for $(k,\ell)$-graphs is NP-complete whenever $k \geq 3$ or $\ell \geq 3$, while it is solvable in polynomial time otherwise. In particular, for the case $k+\ell \leq 2$, recognition can be carried out in linear time, si… ▽ More A graph $G$ is said to be a $(k,\ell)$-graph if its vertex set can be partitioned into $k$ independent sets and $\ell$ cliques. It is well established that the recognition problem for $(k,\ell)$-graphs is NP-complete whenever $k \geq 3$ or $\ell \geq 3$, while it is solvable in polynomial time otherwise. In particular, for the case $k+\ell \leq 2$, recognition can be carried out in linear time, since split graphs coincide with the class of $(1,1)$-graphs, bipartite graphs correspond precisely to $(2,0)$-graphs, and $(\ell,k)$-graphs are the complements of $(k,\ell)$-graphs. Recognition algorithms for $(2,1)$- and $(1,2)$-graphs were provided by Brandstädt, Le and Szymczak in 1998, while the case of $(2,2)$-graphs was addressed by Feder, Hell, Klein, and Motwani in 1999. In this work, we refine these results by presenting improved recognition algorithms with lower time complexity. Specifically, we reduce the running time from $O((n+m)^2)$ to $O(n^2+nm)$ for $(2,1)$-graphs, from $O((n+\overline{m})^2)$ to $O(n^2+n\overline{m})$ for $(1,2)$-graphs, and from $O(n^{10}(n+m))$ to $O(n^4 (n+\min\{m,\overline{m}\})^3)$ for $(2,2)$-graphs. Here, $n$ and $m$ denote the number of vertices and edges of the input graph $G$, respectively, and $\overline{m}$ denotes the number of edges in the complement of $G$. △ Less

Submitted 20 October, 2025; originally announced October 2025.

Comments: 14 pages, 0 figures

MSC Class: 05C85

arXiv:2510.07807 [pdf, ps, other]

GM3: A General Physical Model for Micro-Mobility Vehicles

Authors: Grace Cai, Nithin Parepally, Laura Zheng, Ming C. Lin

Abstract: Modeling the dynamics of micro-mobility vehicles (MMV) is becoming increasingly important for training autonomous vehicle systems and building urban traffic simulations. However, mainstream tools rely on variants of the Kinematic Bicycle Model (KBM) or mode-specific physics that miss tire slip, load transfer, and rider/vehicle lean. To our knowledge, no unified, physics-based model captures these… ▽ More Modeling the dynamics of micro-mobility vehicles (MMV) is becoming increasingly important for training autonomous vehicle systems and building urban traffic simulations. However, mainstream tools rely on variants of the Kinematic Bicycle Model (KBM) or mode-specific physics that miss tire slip, load transfer, and rider/vehicle lean. To our knowledge, no unified, physics-based model captures these dynamics across the full range of common MMVs and wheel layouts. We propose the "Generalized Micro-mobility Model" (GM3), a tire-level formulation based on the tire brush representation that supports arbitrary wheel configurations, including single/double track and multi-wheel platforms. We introduce an interactive model-agnostic simulation framework that decouples vehicle/layout specification from dynamics to compare the GM3 with the KBM and other models, consisting of fixed step RK4 integration, human-in-the-loop and scripted control, real-time trajectory traces and logging for analysis. We also empirically validate the GM3 on the Stanford Drone Dataset's deathCircle (roundabout) scene for biker, skater, and cart classes. △ Less

Submitted 14 March, 2026; v1 submitted 9 October, 2025; originally announced October 2025.

arXiv:2509.25122 [pdf, ps, other]

Triangle Splatting+: Differentiable Rendering with Opaque Triangles

Authors: Jan Held, Renaud Vandeghen, Sanghyun Son, Daniel Rebain, Matheus Gadelha, Yi Zhou, Ming C. Lin, Marc Van Droogenbroeck, Andrea Tagliasacchi

Abstract: Reconstructing 3D scenes and synthesizing novel views has seen rapid progress in recent years. Neural Radiance Fields demonstrated that continuous volumetric radiance fields can achieve high-quality image synthesis, but their long training and rendering times limit practicality. 3D Gaussian Splatting (3DGS) addressed these issues by representing scenes with millions of Gaussians, enabling real-tim… ▽ More Reconstructing 3D scenes and synthesizing novel views has seen rapid progress in recent years. Neural Radiance Fields demonstrated that continuous volumetric radiance fields can achieve high-quality image synthesis, but their long training and rendering times limit practicality. 3D Gaussian Splatting (3DGS) addressed these issues by representing scenes with millions of Gaussians, enabling real-time rendering and fast optimization. However, Gaussian primitives are not natively compatible with the mesh-based pipelines used in VR headsets, and real-time graphics applications. Existing solutions attempt to convert Gaussians into meshes through post-processing or two-stage pipelines, which increases complexity and degrades visual quality. In this work, we introduce Triangle Splatting+, which directly optimizes triangles, the fundamental primitive of computer graphics, within a differentiable splatting framework. We formulate triangle parametrization to enable connectivity through shared vertices, and we design a training strategy that enforces opaque triangles. The final output is immediately usable in standard graphics engines without post-processing. Experiments on the Mip-NeRF360 and Tanks & Temples datasets show that Triangle Splatting+achieves state-of-the-art performance in mesh-based novel view synthesis. Our method surpasses prior splatting approaches in visual fidelity while remaining efficient and fast to training. Moreover, the resulting semi-connected meshes support downstream applications such as physics-based simulation or interactive walkthroughs. The project page is https://trianglesplatting2.github.io/trianglesplatting2/. △ Less

Submitted 29 September, 2025; originally announced September 2025.

Comments: 9 pages, 6 figures, 2 tables

arXiv:2506.01167 [pdf, ps, other]

Accelerated Learning with Linear Temporal Logic using Differentiable Simulation

Authors: Alper Kamil Bozkurt, Calin Belta, Ming C. Lin

Abstract: Ensuring that reinforcement learning (RL) controllers satisfy safety and reliability constraints in real-world settings remains challenging: state-avoidance and constrained Markov decision processes often fail to capture trajectory-level requirements or induce overly conservative behavior. Formal specification languages such as linear temporal logic (LTL) offer correct-by-construction objectives,… ▽ More Ensuring that reinforcement learning (RL) controllers satisfy safety and reliability constraints in real-world settings remains challenging: state-avoidance and constrained Markov decision processes often fail to capture trajectory-level requirements or induce overly conservative behavior. Formal specification languages such as linear temporal logic (LTL) offer correct-by-construction objectives, yet their rewards are typically sparse, and heuristic shaping can undermine correctness. We introduce, to our knowledge, the first end-to-end framework that integrates LTL with differentiable simulators, enabling efficient gradient-based learning directly from formal specifications. Our method relaxes discrete automaton transitions via soft labeling of states, yielding differentiable rewards and state representations that mitigate the sparsity issue intrinsic to LTL while preserving objective soundness. We provide theoretical guarantees connecting Büchi acceptance to both discrete and differentiable LTL returns and derive a tunable bound on their discrepancy in deterministic and stochastic settings. Empirically, across complex, nonlinear, contact-rich continuous-control tasks, our approach substantially accelerates training and achieves up to twice the returns of discrete baselines. We further demonstrate compatibility with reward machines, thereby covering co-safe LTL and LTL$_\text{f}$ without modification. By rendering automaton-based rewards differentiable, our work bridges formal methods and deep RL, enabling safe, specification-driven learning in continuous domains. △ Less

Submitted 2 April, 2026; v1 submitted 1 June, 2025; originally announced June 2025.

arXiv:2505.21880 [pdf]

doi 10.17605/OSF.IO/ABYQH

Incorporating LLMs for Large-Scale Urban Complex Mobility Simulation

Authors: Yu-Lun Song, Chung-En Tsern, Che-Cheng Wu, Yu-Ming Chang, Syuan-Bo Huang, Wei-Chu Chen, Michael Chia-Liang Lin, Yu-Ta Lin

Abstract: This study presents an innovative approach to urban mobility simulation by integrating a Large Language Model (LLM) with Agent-Based Modeling (ABM). Unlike traditional rule-based ABM, the proposed framework leverages LLM to enhance agent diversity and realism by generating synthetic population profiles, allocating routine and occasional locations, and simulating personalized routes. Using real-wor… ▽ More This study presents an innovative approach to urban mobility simulation by integrating a Large Language Model (LLM) with Agent-Based Modeling (ABM). Unlike traditional rule-based ABM, the proposed framework leverages LLM to enhance agent diversity and realism by generating synthetic population profiles, allocating routine and occasional locations, and simulating personalized routes. Using real-world data, the simulation models individual behaviors and large-scale mobility patterns in Taipei City. Key insights, such as route heat maps and mode-specific indicators, provide urban planners with actionable information for policy-making. Future work focuses on establishing robust validation frameworks to ensure accuracy and reliability in urban planning applications. △ Less

Submitted 3 July, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

Comments: 8 pages, 8 figures. This paper is reviewed and accepted by the CUPUM (Computational Urban Planning and Urban Management) Conference held by University College London (UCL) in 2025

arXiv:2505.16819 [pdf, ps, other]

Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts

Authors: Taewon Kang, Ming C. Lin

Abstract: Recent advances in scene-based video generation have enabled systems to synthesize coherent visual narratives from structured prompts. However, a crucial dimension of storytelling -- character-driven dialogue and speech -- remains underexplored. In this paper, we present a modular pipeline that transforms action-level prompts into visually and auditorily grounded narrative dialogue, enriching visu… ▽ More Recent advances in scene-based video generation have enabled systems to synthesize coherent visual narratives from structured prompts. However, a crucial dimension of storytelling -- character-driven dialogue and speech -- remains underexplored. In this paper, we present a modular pipeline that transforms action-level prompts into visually and auditorily grounded narrative dialogue, enriching visual storytelling with natural voice and character expression. Our method takes as input a pair of prompts per scene, where the first defines the setting and the second specifies a character's behavior. While a story generation model such as Text2Story produces the corresponding visual scene, we focus on generating expressive, character-consistent utterances grounded in both the prompts and the scene image. A pretrained vision-language encoder extracts high-level semantic features from a representative frame, capturing salient visual context. These features are then integrated with structured prompts to guide a large language model in synthesizing natural dialogue. To ensure contextual and emotional consistency across scenes, we introduce a Recursive Narrative Bank -- a speaker-aware, temporally structured memory that recursively accumulates each character's dialogue history. Inspired by Script Theory in cognitive psychology, this design enables characters to speak in ways that reflect their evolving goals, social context, and narrative roles throughout the story. Finally, we render each utterance as expressive, character-conditioned speech, resulting in fully-voiced, multimodal video narratives. Our training-free framework generalizes across diverse story settings -- from fantasy adventures to slice-of-life episodes -- offering a scalable solution for coherent, character-grounded audiovisual storytelling. △ Less

Submitted 27 September, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

Comments: 22 pages, 5 figures; revised supplementary document, etc

arXiv:2503.06310 [pdf, ps, other]

Text2Story: Advancing Video Storytelling with Text Guidance

Authors: Taewon Kang, Divya Kothandaraman, Ming C. Lin

Abstract: Generating coherent long-form video sequences from discrete input using only text prompts is a critical task in content creation. While diffusion-based models excel at short video synthesis, long-form storytelling from text remains largely unexplored and a challenge due to difficulties in temporal coherency, preserving semantic meaning, and maintaining both scene context and action continuity acro… ▽ More Generating coherent long-form video sequences from discrete input using only text prompts is a critical task in content creation. While diffusion-based models excel at short video synthesis, long-form storytelling from text remains largely unexplored and a challenge due to difficulties in temporal coherency, preserving semantic meaning, and maintaining both scene context and action continuity across the video. We introduce a novel storytelling framework that achieves this by integrating scene and action prompts through dynamics-inspired prompt mixing. Specifically, we first present a bidirectional time-weighted latent blending strategy to ensure temporal consistency between segments of the long-form video being generated. We then propose a dynamics-informed prompt weighting (DIPW) mechanism that adaptively balances the influence of scene and action prompts at each diffusion timestep by jointly considering CLIP-based alignment, narrative continuity, and temporal smoothness. To further enhance motion continuity, we incorporate a semantic action representation to encode high-level action semantics into the blending process, dynamically adjusting transitions based on action similarity and ensuring smooth yet adaptable motion changes. Latent space blending maintains spatial coherence between objects in a scene, while time-weighted blending enforces bidirectional constraints for temporal consistency. The resulting integrative system prevents abrupt transitions while ensuring fluid storytelling that faithfully reflects both scene and action cues. Extensive experiments demonstrate significant improvements over baselines, achieving temporally consistent and visually compelling video narratives without any additional training. This approach bridges the gap between short clips and extended video to establish a new paradigm in GenAI-driven video synthesis from text. △ Less

Submitted 27 September, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

Comments: 16 pages, 4 figures; revised method figure, etc

arXiv:2503.04994 [pdf, other]

Quantifying and Modeling Driving Styles in Trajectory Forecasting

Authors: Laura Zheng, Hamidreza Yaghoubi Araghi, Tony Wu, Sandeep Thalapanane, Tianyi Zhou, Ming C. Lin

Abstract: Trajectory forecasting has become a popular deep learning task due to its relevance for scenario simulation for autonomous driving. Specifically, trajectory forecasting predicts the trajectory of a short-horizon future for specific human drivers in a particular traffic scenario. Robust and accurate future predictions can enable autonomous driving planners to optimize for low-risk and predictable o… ▽ More Trajectory forecasting has become a popular deep learning task due to its relevance for scenario simulation for autonomous driving. Specifically, trajectory forecasting predicts the trajectory of a short-horizon future for specific human drivers in a particular traffic scenario. Robust and accurate future predictions can enable autonomous driving planners to optimize for low-risk and predictable outcomes for human drivers around them. Although some work has been done to model driving style in planning and personalized autonomous polices, a gap exists in explicitly modeling human driving styles for trajectory forecasting of human behavior. Human driving style is most certainly a correlating factor to decision making, especially in edge-case scenarios where risk is nontrivial, as justified by the large amount of traffic psychology literature on risky driving. So far, the current real-world datasets for trajectory forecasting lack insight on the variety of represented driving styles. While the datasets may represent real-world distributions of driving styles, we posit that fringe driving style types may also be correlated with edge-case safety scenarios. In this work, we conduct analyses on existing real-world trajectory datasets for driving and dissect these works from the lens of driving styles, which is often intangible and non-standardized. △ Less

Submitted 6 March, 2025; originally announced March 2025.

arXiv:2502.00050 [pdf, other]

DISC: Dataset for Analyzing Driving Styles In Simulated Crashes for Mixed Autonomy

Authors: Sandip Sharan Senthil Kumar, Sandeep Thalapanane, Guru Nandhan Appiya Dilipkumar Peethambari, Sourang SriHari, Laura Zheng, Ming C. Lin

Abstract: Handling pre-crash scenarios is still a major challenge for self-driving cars due to limited practical data and human-driving behavior datasets. We introduce DISC (Driving Styles In Simulated Crashes), one of the first datasets designed to capture various driving styles and behaviors in pre-crash scenarios for mixed autonomy analysis. DISC includes over 8 classes of driving styles/behaviors from h… ▽ More Handling pre-crash scenarios is still a major challenge for self-driving cars due to limited practical data and human-driving behavior datasets. We introduce DISC (Driving Styles In Simulated Crashes), one of the first datasets designed to capture various driving styles and behaviors in pre-crash scenarios for mixed autonomy analysis. DISC includes over 8 classes of driving styles/behaviors from hundreds of drivers navigating a simulated vehicle through a virtual city, encountering rare-event traffic scenarios. This dataset enables the classification of pre-crash human driving behaviors in unsafe conditions, supporting individualized trajectory prediction based on observed driving patterns. By utilizing a custom-designed VR-based in-house driving simulator, TRAVERSE, data was collected through a driver-centric study involving human drivers encountering twelve simulated accident scenarios. This dataset fills a critical gap in human-centric driving data for rare events involving interactions with autonomous vehicles. It enables autonomous systems to better react to human drivers and optimize trajectory prediction in mixed autonomy environments involving both human-driven and self-driving cars. In addition, individual driving behaviors are classified through a set of standardized questionnaires, carefully designed to identify and categorize driving behavior traits. We correlate data features with driving behaviors, showing that the simulated environment reflects real-world driving styles. DISC is the first dataset to capture how various driving styles respond to accident scenarios, offering significant potential to enhance autonomous vehicle safety and driving behavior analysis in mixed autonomy environments. △ Less

Submitted 28 January, 2025; originally announced February 2025.

arXiv:2412.16776 [pdf, ps, other]

DMesh++: An Efficient Differentiable Mesh for Complex Shapes

Authors: Sanghyun Son, Matheus Gadelha, Yang Zhou, Matthew Fisher, Zexiang Xu, Yi-Ling Qiao, Ming C. Lin, Yi Zhou

Abstract: Recent probabilistic methods for 3D triangular meshes capture diverse shapes by differentiable mesh connectivity, but face high computational costs with increased shape details. We introduce a new differentiable mesh processing method that addresses this challenge and efficiently handles meshes with intricate structures. Our method reduces time complexity from O(N) to O(log N) and requires signifi… ▽ More Recent probabilistic methods for 3D triangular meshes capture diverse shapes by differentiable mesh connectivity, but face high computational costs with increased shape details. We introduce a new differentiable mesh processing method that addresses this challenge and efficiently handles meshes with intricate structures. Our method reduces time complexity from O(N) to O(log N) and requires significantly less memory than previous approaches. Building on this innovation, we present a reconstruction algorithm capable of generating complex 2D and 3D shapes from point clouds or multi-view images. Visit our project page (https://sonsang.github.io/dmesh2-project) for source code and supplementary material. △ Less

Submitted 6 July, 2025; v1 submitted 21 December, 2024; originally announced December 2024.

Comments: 20 pages, 24 figures, 6 tables

arXiv:2412.16750 [pdf, other]

Gradient-based Trajectory Optimization with Parallelized Differentiable Traffic Simulation

Authors: Sanghyun Son, Laura Zheng, Brian Clipp, Connor Greenwell, Sujin Philip, Ming C. Lin

Abstract: We present a parallelized differentiable traffic simulator based on the Intelligent Driver Model (IDM), a car-following framework that incorporates driver behavior as key variables. Our vehicle simulator efficiently models vehicle motion, generating trajectories that can be supervised to fit real-world data. By leveraging its differentiable nature, IDM parameters are optimized using gradient-based… ▽ More We present a parallelized differentiable traffic simulator based on the Intelligent Driver Model (IDM), a car-following framework that incorporates driver behavior as key variables. Our vehicle simulator efficiently models vehicle motion, generating trajectories that can be supervised to fit real-world data. By leveraging its differentiable nature, IDM parameters are optimized using gradient-based methods. With the capability to simulate up to 2 million vehicles in real time, the system is scalable for large-scale trajectory optimization. We show that we can use the simulator to filter noise in the input trajectories (trajectory filtering), reconstruct dense trajectories from sparse ones (trajectory reconstruction), and predict future trajectories (trajectory prediction), with all generated trajectories adhering to physical laws. We validate our simulator and algorithm on several datasets including NGSIM and Waymo Open Dataset. The code is publicly available at: https://github.com/SonSang/diffidm. △ Less

Submitted 17 February, 2025; v1 submitted 21 December, 2024; originally announced December 2024.

Comments: 9 pages, 6 figures, 3 tables

arXiv:2411.01391 [pdf, other]

Differentiable Quantum Computing for Large-scale Linear Control

Authors: Connor Clayton, Jiaqi Leng, Gengzhi Yang, Yi-Ling Qiao, Ming C. Lin, Xiaodi Wu

Abstract: As industrial models and designs grow increasingly complex, the demand for optimal control of large-scale dynamical systems has significantly increased. However, traditional methods for optimal control incur significant overhead as problem dimensions grow. In this paper, we introduce an end-to-end quantum algorithm for linear-quadratic control with provable speedups. Our algorithm, based on a poli… ▽ More As industrial models and designs grow increasingly complex, the demand for optimal control of large-scale dynamical systems has significantly increased. However, traditional methods for optimal control incur significant overhead as problem dimensions grow. In this paper, we introduce an end-to-end quantum algorithm for linear-quadratic control with provable speedups. Our algorithm, based on a policy gradient method, incorporates a novel quantum subroutine for solving the matrix Lyapunov equation. Specifically, we build a quantum-assisted differentiable simulator for efficient gradient estimation that is more accurate and robust than classical methods relying on stochastic approximation. Compared to the classical approaches, our method achieves a super-quadratic speedup. To the best of our knowledge, this is the first end-to-end quantum application to linear control problems with provable quantum advantage. △ Less

Submitted 2 November, 2024; originally announced November 2024.

arXiv:2410.00292 [pdf, other]

Insight: A Multi-Modal Diagnostic Pipeline using LLMs for Ocular Surface Disease Diagnosis

Authors: Chun-Hsiao Yeh, Jiayun Wang, Andrew D. Graham, Andrea J. Liu, Bo Tan, Yubei Chen, Yi Ma, Meng C. Lin

Abstract: Accurate diagnosis of ocular surface diseases is critical in optometry and ophthalmology, which hinge on integrating clinical data sources (e.g., meibography imaging and clinical metadata). Traditional human assessments lack precision in quantifying clinical observations, while current machine-based methods often treat diagnoses as multi-class classification problems, limiting the diagnoses to a p… ▽ More Accurate diagnosis of ocular surface diseases is critical in optometry and ophthalmology, which hinge on integrating clinical data sources (e.g., meibography imaging and clinical metadata). Traditional human assessments lack precision in quantifying clinical observations, while current machine-based methods often treat diagnoses as multi-class classification problems, limiting the diagnoses to a predefined closed-set of curated answers without reasoning the clinical relevance of each variable to the diagnosis. To tackle these challenges, we introduce an innovative multi-modal diagnostic pipeline (MDPipe) by employing large language models (LLMs) for ocular surface disease diagnosis. We first employ a visual translator to interpret meibography images by converting them into quantifiable morphology data, facilitating their integration with clinical metadata and enabling the communication of nuanced medical insight to LLMs. To further advance this communication, we introduce a LLM-based summarizer to contextualize the insight from the combined morphology and clinical metadata, and generate clinical report summaries. Finally, we refine the LLMs' reasoning ability with domain-specific insight from real-life clinician diagnoses. Our evaluation across diverse ocular surface disease diagnosis benchmarks demonstrates that MDPipe outperforms existing standards, including GPT-4, and provides clinically sound rationales for diagnoses. △ Less

Submitted 30 September, 2024; originally announced October 2024.

Comments: Accepted to MICCAI 2024. Project Webpage: https://danielchyeh.github.io/MDPipe/

arXiv:2408.06157 [pdf, ps, other]

3D-free meets 3D priors: Novel View Synthesis from a Single Image with Pretrained Diffusion Guidance

Authors: Taewon Kang, Divya Kothandaraman, Dinesh Manocha, Ming C. Lin

Abstract: Recent 3D novel view synthesis (NVS) methods often require extensive 3D data for training, and also typically lack generalization beyond the training distribution. Moreover, they tend to be object centric and struggle with complex and intricate scenes. Conversely, 3D-free methods can generate text-controlled views of complex, in-the-wild scenes using a pretrained stable diffusion model without the… ▽ More Recent 3D novel view synthesis (NVS) methods often require extensive 3D data for training, and also typically lack generalization beyond the training distribution. Moreover, they tend to be object centric and struggle with complex and intricate scenes. Conversely, 3D-free methods can generate text-controlled views of complex, in-the-wild scenes using a pretrained stable diffusion model without the need for a large amount of 3D-based training data, but lack camera control. In this paper, we introduce a method capable of generating camera-controlled viewpoints from a single input image, by combining the benefits of 3D-free and 3D-based approaches. Our method excels in handling complex and diverse scenes without extensive training or additional 3D and multiview data. It leverages widely available pretrained NVS models for weak guidance, integrating this knowledge into a 3D-free view synthesis style approach, along with enriching the CLIP vision-language space with 3D camera angle information, to achieve the desired results. Experimental results demonstrate that our method outperforms existing models in both qualitative and quantitative evaluations, achieving high-fidelity, consistent novel view synthesis at desired camera angles across a wide variety of scenes while maintaining accurate, natural detail representation and image clarity across various viewpoints. We also support our method with a comprehensive analysis of 2D image generation models and the 3D space, providing a solid foundation and rationale for our solution. △ Less

Submitted 15 November, 2025; v1 submitted 12 August, 2024; originally announced August 2024.

Comments: Accepted to The 40th Annual AAAI Conference on Artificial Intelligence (AAAI-26), AAAI 2026 Workshop on AI for Environmental Science (AI4ES). Due to arXiv's 1,920-character limit, the abstract here is shortened. Please refer to the paper (View PDF) to read the full abstract. 14 pages, 13 figures, v5: AAAI-26 camera-ready version

arXiv:2407.09466 [pdf, other]

TRAVERSE: Traffic-Responsive Autonomous Vehicle Experience & Rare-event Simulation for Enhanced safety

Authors: Sandeep Thalapanane, Sandip Sharan Senthil Kumar, Guru Nandhan Appiya Dilipkumar Peethambari, Sourang SriHari, Laura Zheng, Julio Poveda, Ming C. Lin

Abstract: Data for training learning-enabled self-driving cars in the physical world are typically collected in a safe, normal environment. Such data distribution often engenders a strong bias towards safe driving, making self-driving cars unprepared when encountering adversarial scenarios like unexpected accidents. Due to a dearth of such adverse data that is unrealistic for drivers to collect, autonomous… ▽ More Data for training learning-enabled self-driving cars in the physical world are typically collected in a safe, normal environment. Such data distribution often engenders a strong bias towards safe driving, making self-driving cars unprepared when encountering adversarial scenarios like unexpected accidents. Due to a dearth of such adverse data that is unrealistic for drivers to collect, autonomous vehicles can perform poorly when experiencing such rare events. This work addresses much-needed research by having participants drive a VR vehicle simulator going through simulated traffic with various types of accidental scenarios. It aims to understand human responses and behaviors in simulated accidents, contributing to our understanding of driving dynamics and safety. The simulation framework adopts a robust traffic simulation and is rendered using the Unity Game Engine. Furthermore, the simulation framework is built with portable, light-weight immersive driving simulator hardware, lowering the resource barrier for studies in autonomous driving research. Keywords: Rare Events, Traffic Simulation, Autonomous Driving, Virtual Reality, User Studies △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2406.09648 [pdf, other]

An Intrinsic Vector Heat Network

Authors: Alexander Gao, Maurice Chu, Mubbasir Kapadia, Ming C. Lin, Hsueh-Ti Derek Liu

Abstract: Vector fields are widely used to represent and model flows for many science and engineering applications. This paper introduces a novel neural network architecture for learning tangent vector fields that are intrinsically defined on manifold surfaces embedded in 3D. Previous approaches to learning vector fields on surfaces treat vectors as multi-dimensional scalar fields, using traditional scalar-… ▽ More Vector fields are widely used to represent and model flows for many science and engineering applications. This paper introduces a novel neural network architecture for learning tangent vector fields that are intrinsically defined on manifold surfaces embedded in 3D. Previous approaches to learning vector fields on surfaces treat vectors as multi-dimensional scalar fields, using traditional scalar-valued architectures to process channels individually, thus fail to preserve fundamental intrinsic properties of the vector field. The core idea of this work is to introduce a trainable vector heat diffusion module to spatially propagate vector-valued feature data across the surface, which we incorporate into our proposed architecture that consists of vector-valued neurons. Our architecture is invariant to rigid motion of the input, isometric deformation, and choice of local tangent bases, and is robust to discretizations of the surface. We evaluate our Vector Heat Network on triangle meshes, and empirically validate its invariant properties. We also demonstrate the effectiveness of our method on the useful industrial application of quadrilateral mesh generation. △ Less

Submitted 18 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.01431 [pdf, other]

Deep Stochastic Kinematic Models for Probabilistic Motion Forecasting in Traffic

Authors: Laura Zheng, Sanghyun Son, Jing Liang, Xijun Wang, Brian Clipp, Ming C. Lin

Abstract: In trajectory forecasting tasks for traffic, future output trajectories can be computed by advancing the ego vehicle's state with predicted actions according to a kinematics model. By unrolling predicted trajectories via time integration and models of kinematic dynamics, predicted trajectories should not only be kinematically feasible but also relate uncertainty from one timestep to the next. Whil… ▽ More In trajectory forecasting tasks for traffic, future output trajectories can be computed by advancing the ego vehicle's state with predicted actions according to a kinematics model. By unrolling predicted trajectories via time integration and models of kinematic dynamics, predicted trajectories should not only be kinematically feasible but also relate uncertainty from one timestep to the next. While current works in probabilistic prediction do incorporate kinematic priors for mean trajectory prediction, _variance_ is often left as a learnable parameter, despite uncertainty in one time step being inextricably tied to uncertainty in the previous time step. In this paper, we show simple and differentiable analytical approximations describing the relationship between variance at one timestep and that at the next with the kinematic bicycle model. In our results, we find that encoding the relationship between variance across timesteps works especially well in unoptimal settings, such as with small or noisy datasets. We observe up to a 50% performance boost in partial dataset settings and up to an 8% performance boost in large-scale learning compared to previous kinematic prediction methods on SOTA trajectory forecasting architectures out-of-the-box, with no fine-tuning. △ Less

Submitted 6 September, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: 8 pages

arXiv:2406.01425 [pdf, ps, other]

Adaptive Sensitivity Analysis for Robust Augmentation against Natural Corruptions in Image Segmentation

Authors: Laura Zheng, Wenjie Wei, Tony Wu, Jacob Clements, Shreelekha Revankar, Andre Harrison, Yu Shen, Ming C. Lin

Abstract: Achieving robustness in image segmentation models is challenging due to the fine-grained nature of pixel-level classification. These models, which are crucial for many real-time perception applications, particularly struggle when faced with natural corruptions in the wild for autonomous systems. While sensitivity analysis can help us understand how input variables influence model outputs, its appl… ▽ More Achieving robustness in image segmentation models is challenging due to the fine-grained nature of pixel-level classification. These models, which are crucial for many real-time perception applications, particularly struggle when faced with natural corruptions in the wild for autonomous systems. While sensitivity analysis can help us understand how input variables influence model outputs, its application to natural and uncontrollable corruptions in training data is computationally expensive. In this work, we present an adaptive, sensitivity-guided augmentation method to enhance robustness against natural corruptions. Our sensitivity analysis on average runs 10x faster and requires about 200x less storage than previous sensitivity analysis, enabling practical, on-the-fly estimation during training for a model-free augmentation policy. With minimal fine-tuning, our sensitivity-guided augmentation method achieves improved robustness on both real-world and synthetic datasets compared to state-of-the-art data augmentation techniques in image segmentation. Code implementation for this work can be found at: https://github.com/laurayuzheng/SensAug. △ Less

Submitted 16 June, 2025; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: 9 pages

arXiv:2404.13445 [pdf, ps, other]

DMesh: A Differentiable Mesh Representation

Authors: Sanghyun Son, Matheus Gadelha, Yang Zhou, Zexiang Xu, Ming C. Lin, Yi Zhou

Abstract: We present a differentiable representation, DMesh, for general 3D triangular meshes. DMesh considers both the geometry and connectivity information of a mesh. In our design, we first get a set of convex tetrahedra that compactly tessellates the domain based on Weighted Delaunay Triangulation (WDT), and select triangular faces on the tetrahedra to define the final mesh. We formulate probability of… ▽ More We present a differentiable representation, DMesh, for general 3D triangular meshes. DMesh considers both the geometry and connectivity information of a mesh. In our design, we first get a set of convex tetrahedra that compactly tessellates the domain based on Weighted Delaunay Triangulation (WDT), and select triangular faces on the tetrahedra to define the final mesh. We formulate probability of faces to exist on the actual surface in a differentiable manner based on the WDT. This enables DMesh to represent meshes of various topology in a differentiable way, and allows us to reconstruct the mesh under various observations, such as point cloud and multi-view images using gradient-based optimization. The source code and full paper is available at: https://sonsang.github.io/dmesh-project. △ Less

Submitted 5 July, 2025; v1 submitted 20 April, 2024; originally announced April 2024.

Comments: 36 pages, 24 figures. Updated with camera-ready version

arXiv:2312.08710 [pdf, other]

Gradient Informed Proximal Policy Optimization

Authors: Sanghyun Son, Laura Yu Zheng, Ryan Sullivan, Yi-Ling Qiao, Ming C. Lin

Abstract: We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm. To incorporate analytical gradients into the PPO framework, we introduce the concept of an α-policy that stands as a locally superior policy. By adaptively modifying the α value, we can effectively manage the influence of analytica… ▽ More We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm. To incorporate analytical gradients into the PPO framework, we introduce the concept of an α-policy that stands as a locally superior policy. By adaptively modifying the α value, we can effectively manage the influence of analytical policy gradients during learning. To this end, we suggest metrics for assessing the variance and bias of analytical gradients, reducing dependence on these gradients when high variance or bias is detected. Our proposed approach outperforms baseline algorithms in various scenarios, such as function optimization, physics simulations, and traffic control environments. Our code can be found online: https://github.com/SonSang/gippo. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 27 pages, NeurIPS 2023 Conference

arXiv:2310.20491 [pdf, other]

Collaborative Decision-Making Using Spatiotemporal Graphs in Connected Autonomy

Authors: Peng Gao, Yu Shen, Ming C. Lin

Abstract: Collaborative decision-making is an essential capability for multi-robot systems, such as connected vehicles, to collaboratively control autonomous vehicles in accident-prone scenarios. Under limited communication bandwidth, capturing comprehensive situational awareness by integrating connected agents' observation is very challenging. In this paper, we propose a novel collaborative decision-making… ▽ More Collaborative decision-making is an essential capability for multi-robot systems, such as connected vehicles, to collaboratively control autonomous vehicles in accident-prone scenarios. Under limited communication bandwidth, capturing comprehensive situational awareness by integrating connected agents' observation is very challenging. In this paper, we propose a novel collaborative decision-making method that efficiently and effectively integrates collaborators' representations to control the ego vehicle in accident-prone scenarios. Our approach formulates collaborative decision-making as a classification problem. We first represent sequences of raw observations as spatiotemporal graphs, which significantly reduce the package size to share among connected vehicles. Then we design a novel spatiotemporal graph neural network based on heterogeneous graph learning, which analyzes spatial and temporal connections of objects in a unified way for collaborative decision-making. We evaluate our approach using a high-fidelity simulator that considers realistic traffic, communication bandwidth, and vehicle sensing among connected autonomous vehicles. The experimental results show that our representation achieves over 100x reduction in the shared data size that meets the requirements of communication bandwidth for connected autonomous driving. In addition, our approach achieves over 30% improvements in driving safety. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2309.08214 [pdf, other]

MTG: Mapless Trajectory Generator with Traversability Coverage for Outdoor Navigation

Authors: Jing Liang, Peng Gao, Xuesu Xiao, Adarsh Jagan Sathyamoorthy, Mohamed Elnoor, Ming C. Lin, Dinesh Manocha

Abstract: We present a novel learning-based trajectory generation algorithm for outdoor robot navigation. Our goal is to compute collision-free paths that also satisfy the environment-specific traversability constraints. Our approach is designed for global planning using limited onboard robot perception in mapless environments while ensuring comprehensive coverage of all traversable directions. Our formulat… ▽ More We present a novel learning-based trajectory generation algorithm for outdoor robot navigation. Our goal is to compute collision-free paths that also satisfy the environment-specific traversability constraints. Our approach is designed for global planning using limited onboard robot perception in mapless environments while ensuring comprehensive coverage of all traversable directions. Our formulation uses a Conditional Variational Autoencoder (CVAE) generative model that is enhanced with traversability constraints and an optimization formulation used for the coverage. We highlight the benefits of our approach over state-of-the-art trajectory generation approaches and demonstrate its performance in challenging and large outdoor environments, including around buildings, across intersections, along trails, and off-road terrain, using a Clearpath Husky and a Boston Dynamics Spot robot. In practice, our approach results in a 6% improvement in coverage of traversable areas and an 89% reduction in trajectory portions residing in non-traversable regions. Our video is here: https://youtu.be/3eJ2soAzXnU △ Less

Submitted 4 March, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: 9

arXiv:2309.04581 [pdf, other]

Dynamic Mesh-Aware Radiance Fields

Authors: Yi-Ling Qiao, Alexander Gao, Yiran Xu, Yue Feng, Jia-Bin Huang, Ming C. Lin

Abstract: Embedding polygonal mesh assets within photorealistic Neural Radience Fields (NeRF) volumes, such that they can be rendered and their dynamics simulated in a physically consistent manner with the NeRF, is under-explored from the system perspective of integrating NeRF into the traditional graphics pipeline. This paper designs a two-way coupling between mesh and NeRF during rendering and simulation.… ▽ More Embedding polygonal mesh assets within photorealistic Neural Radience Fields (NeRF) volumes, such that they can be rendered and their dynamics simulated in a physically consistent manner with the NeRF, is under-explored from the system perspective of integrating NeRF into the traditional graphics pipeline. This paper designs a two-way coupling between mesh and NeRF during rendering and simulation. We first review the light transport equations for both mesh and NeRF, then distill them into an efficient algorithm for updating radiance and throughput along a cast ray with an arbitrary number of bounces. To resolve the discrepancy between the linear color space that the path tracer assumes and the sRGB color space that standard NeRF uses, we train NeRF with High Dynamic Range (HDR) images. We also present a strategy to estimate light sources and cast shadows on the NeRF. Finally, we consider how the hybrid surface-volumetric formulation can be efficiently integrated with a high-performance physics simulator that supports cloth, rigid and soft bodies. The full rendering and simulation system can be run on a GPU at interactive rates. We show that a hybrid system approach outperforms alternatives in visual realism for mesh insertion, because it allows realistic light transport from volumetric NeRF media onto surfaces, which affects the appearance of reflective/refractive surfaces and illumination of diffuse surfaces informed by the dynamic scene. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: ICCV 2023

arXiv:2303.04901 [pdf, other]

Towards Driving Policies with Personality: Modeling Behavior and Style in Risky Scenarios via Data Collection in Virtual Reality

Authors: Laura Zheng, Julio Poveda, James Mullen, Shreelekha Revankar, Ming C. Lin

Abstract: Autonomous driving research currently faces data sparsity in representation of risky scenarios. Such data is both difficult to obtain ethically in the real world, and unreliable to obtain via simulation. Recent advances in virtual reality (VR) driving simulators lower barriers to tackling this problem in simulation. We propose the first data collection framework for risky scenario driving data fro… ▽ More Autonomous driving research currently faces data sparsity in representation of risky scenarios. Such data is both difficult to obtain ethically in the real world, and unreliable to obtain via simulation. Recent advances in virtual reality (VR) driving simulators lower barriers to tackling this problem in simulation. We propose the first data collection framework for risky scenario driving data from real humans using VR, as well as accompanying numerical driving personality characterizations. We validate the resulting dataset with statistical analyses and model driving behavior with an eight-factor personality vector based on the Multi-dimensional Driving Style Inventory (MDSI). Our method, dataset, and analyses show that realistic driving personalities can be modeled without deep learning or large datasets to complement autonomous driving research. △ Less

Submitted 8 March, 2023; originally announced March 2023.

arXiv:2301.05311 [pdf, other]

A Framework for Active Haptic Guidance Using Robotic Haptic Proxies

Authors: Niall L. Williams, Nicholas Rewkowski, Jiasheng Li, Ming C. Lin

Abstract: Haptic feedback is an important component of creating an immersive mixed reality experience. Traditionally, haptic forces are rendered in response to the user's interactions with the virtual environment. In this work, we explore the idea of rendering haptic forces in a proactive manner, with the explicit intention to influence the user's behavior through compelling haptic forces. To this end, we p… ▽ More Haptic feedback is an important component of creating an immersive mixed reality experience. Traditionally, haptic forces are rendered in response to the user's interactions with the virtual environment. In this work, we explore the idea of rendering haptic forces in a proactive manner, with the explicit intention to influence the user's behavior through compelling haptic forces. To this end, we present a framework for active haptic guidance in mixed reality, using one or more robotic haptic proxies to influence user behavior and deliver a safer and more immersive virtual experience. We provide details on common challenges that need to be overcome when implementing active haptic guidance, and discuss example applications that show how active haptic guidance can be used to influence the user's behavior. Finally, we apply active haptic guidance to a virtual reality navigation problem, and conduct a user study that demonstrates how active haptic guidance creates a safer and more immersive experience for users. △ Less

Submitted 27 February, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

arXiv:2212.03450 [pdf, other]

Tracking the Dynamics of the Tear Film Lipid Layer

Authors: Tejasvi Kothapalli, Charlie Shou, Jennifer Ding, Jiayun Wang, Andrew D. Graham, Tatyana Svitova, Stella X. Yu, Meng C. Lin

Abstract: Dry Eye Disease (DED) is one of the most common ocular diseases: over five percent of US adults suffer from DED. Tear film instability is a known factor for DED, and is thought to be regulated in large part by the thin lipid layer that covers and stabilizes the tear film. In order to aid eye related disease diagnosis, this work proposes a novel paradigm in using computer vision techniques to numer… ▽ More Dry Eye Disease (DED) is one of the most common ocular diseases: over five percent of US adults suffer from DED. Tear film instability is a known factor for DED, and is thought to be regulated in large part by the thin lipid layer that covers and stabilizes the tear film. In order to aid eye related disease diagnosis, this work proposes a novel paradigm in using computer vision techniques to numerically analyze the tear film lipid layer (TFLL) spread. Eleven videos of the tear film lipid layer spread are collected with a micro-interferometer and a subset are annotated. A tracking algorithm relying on various pillar computer vision techniques is developed. Our method can be found at https://easytear-dev.github.io/. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: NeurIPS Medical Imaging Workshop

arXiv:2210.12352 [pdf, other]

NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos

Authors: Yi-Ling Qiao, Alexander Gao, Ming C. Lin

Abstract: We present a method for learning 3D geometry and physics parameters of a dynamic scene from only a monocular RGB video input. To decouple the learning of underlying scene geometry from dynamic motion, we represent the scene as a time-invariant signed distance function (SDF) which serves as a reference frame, along with a time-conditioned deformation field. We further bridge this neural geometry re… ▽ More We present a method for learning 3D geometry and physics parameters of a dynamic scene from only a monocular RGB video input. To decouple the learning of underlying scene geometry from dynamic motion, we represent the scene as a time-invariant signed distance function (SDF) which serves as a reference frame, along with a time-conditioned deformation field. We further bridge this neural geometry representation with a differentiable physics simulator by designing a two-way conversion between the neural field and its corresponding hexahedral mesh, enabling us to estimate physics parameters from the source video by minimizing a cycle consistency loss. Our method also allows a user to interactively edit 3D objects from the source video by modifying the recovered hexahedral mesh, and propagating the operation back to the neural field representation. Experiments show that our method achieves superior mesh and video reconstruction of dynamic scenes compared to competing Neural Field approaches, and we provide extensive examples which demonstrate its ability to extract useful 3D representations from videos captured with consumer-grade cameras. △ Less

Submitted 22 October, 2022; originally announced October 2022.

Comments: NeurIPS 2022

arXiv:2210.08046 [pdf, other]

Differentiable Hybrid Traffic Simulation

Authors: Sanghyun Son, Yi-Ling Qiao, Jason Sewall, Ming C. Lin

Abstract: We introduce a novel differentiable hybrid traffic simulator, which simulates traffic using a hybrid model of both macroscopic and microscopic models and can be directly integrated into a neural network for traffic control and flow optimization. This is the first differentiable traffic simulator for macroscopic and hybrid models that can compute gradients for traffic states across time steps and i… ▽ More We introduce a novel differentiable hybrid traffic simulator, which simulates traffic using a hybrid model of both macroscopic and microscopic models and can be directly integrated into a neural network for traffic control and flow optimization. This is the first differentiable traffic simulator for macroscopic and hybrid models that can compute gradients for traffic states across time steps and inhomogeneous lanes. To compute the gradient flow between two types of traffic models in a hybrid framework, we present a novel intermediate conversion component that bridges the lanes in a differentiable manner as well. We also show that we can use analytical gradients to accelerate the overall process and enhance scalability. Thanks to these gradients, our simulator can provide more efficient and scalable solutions for complex learning and control problems posed in traffic engineering than other existing algorithms. Refer to https://sites.google.com/umd.edu/diff-hybrid-traffic-sim for our project. △ Less

Submitted 14 October, 2022; originally announced October 2022.

Comments: 13 pages, Siggraph Asia 2022 Journal Paper

ACM Class: I.6.1; I.6.3

arXiv:2210.05667 [pdf, other]

Human Body Measurement Estimation with Adversarial Augmentation

Authors: Nataniel Ruiz, Miriam Bellver, Timo Bolkart, Ambuj Arora, Ming C. Lin, Javier Romero, Raja Bala

Abstract: We present a Body Measurement network (BMnet) for estimating 3D anthropomorphic measurements of the human body shape from silhouette images. Training of BMnet is performed on data from real human subjects, and augmented with a novel adversarial body simulator (ABS) that finds and synthesizes challenging body shapes. ABS is based on the skinned multiperson linear (SMPL) body model, and aims to maxi… ▽ More We present a Body Measurement network (BMnet) for estimating 3D anthropomorphic measurements of the human body shape from silhouette images. Training of BMnet is performed on data from real human subjects, and augmented with a novel adversarial body simulator (ABS) that finds and synthesizes challenging body shapes. ABS is based on the skinned multiperson linear (SMPL) body model, and aims to maximize BMnet measurement prediction error with respect to latent SMPL shape parameters. ABS is fully differentiable with respect to these parameters, and trained end-to-end via backpropagation with BMnet in the loop. Experiments show that ABS effectively discovers adversarial examples, such as bodies with extreme body mass indices (BMI), consistent with the rarity of extreme-BMI bodies in BMnet's training set. Thus ABS is able to reveal gaps in training data and potential failures in predicting under-represented body shapes. Results show that training BMnet with ABS improves measurement prediction accuracy on real bodies by up to 10%, when compared to no augmentation or random body shape sampling. Furthermore, our method significantly outperforms SOTA measurement estimation methods by as much as 3x. Finally, we release BodyM, the first challenging, large-scale dataset of photo silhouettes and body measurements of real human subjects, to further promote research in this area. Project website: https://adversarialbodysim.github.io △ Less

Submitted 11 October, 2022; originally announced October 2022.

Comments: Published at the International Conference on 3D Vision (3DV) 2022

arXiv:2210.03772 [pdf, other]

Traffic-Aware Autonomous Driving with Differentiable Traffic Simulation

Authors: Laura Zheng, Sanghyun Son, Ming C. Lin

Abstract: While there have been advancements in autonomous driving control and traffic simulation, there have been little to no works exploring their unification with deep learning. Works in both areas seem to focus on entirely different exclusive problems, yet traffic and driving are inherently related in the real world. In this paper, we present Traffic-Aware Autonomous Driving (TrAAD), a generalizable di… ▽ More While there have been advancements in autonomous driving control and traffic simulation, there have been little to no works exploring their unification with deep learning. Works in both areas seem to focus on entirely different exclusive problems, yet traffic and driving are inherently related in the real world. In this paper, we present Traffic-Aware Autonomous Driving (TrAAD), a generalizable distillation-style method for traffic-informed imitation learning that directly optimizes for faster traffic flow and lower energy consumption. TrAAD focuses on the supervision of speed control in imitation learning systems, as most driving research focuses on perception and steering. Moreover, our method addresses the lack of co-simulation between traffic and driving simulators and provides a basis for directly involving traffic simulation with autonomous driving in future work. Our results show that, with information from traffic simulation involved in the supervision of imitation learning methods, an autonomous vehicle can learn how to accelerate in a fashion that is beneficial for traffic flow and overall energy consumption for all nearby vehicles. △ Less

Submitted 6 April, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

arXiv:2205.01758 [pdf, other]

Differentiable Simulation of Soft Multi-body Systems

Authors: Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, Ming C. Lin

Abstract: We present a method for differentiable simulation of soft articulated bodies. Our work enables the integration of differentiable physical dynamics into gradient-based pipelines. We develop a top-down matrix assembly algorithm within Projective Dynamics and derive a generalized dry friction model for soft continuum using a new matrix splitting strategy. We derive a differentiable control framework… ▽ More We present a method for differentiable simulation of soft articulated bodies. Our work enables the integration of differentiable physical dynamics into gradient-based pipelines. We develop a top-down matrix assembly algorithm within Projective Dynamics and derive a generalized dry friction model for soft continuum using a new matrix splitting strategy. We derive a differentiable control framework for soft articulated bodies driven by muscles, joint torques, or pneumatic tubes. The experiments demonstrate that our designs make soft body simulation more stable and realistic compared to other frameworks. Our method accelerates the solution of system identification problems by more than an order of magnitude, and enables efficient gradient-based learning of motion control with soft robots. △ Less

Submitted 3 May, 2022; originally announced May 2022.

Comments: NeurIPS 2021

arXiv:2204.11785 [pdf, ps, other]

Graphs whose vertices of degree at least 2 lie in a triangle

Authors: Vinicius L. do Forte, Min Chih Lin, Abilio Lucena, Nelson Maculan, Veronica A. Moyano, Jayme L. Szwarcfiter

Abstract: A pendant vertex is one of degree one and an isolated vertex has degree zero. A neighborhood star-free (NSF for short) graph is one in which every vertex is contained in a triangle except pendant vertices and isolated vertices. This class has been considered before for several contexts. In the present paper, we study the complexity of the dominating induced matching (DIM) problem and the perfect e… ▽ More A pendant vertex is one of degree one and an isolated vertex has degree zero. A neighborhood star-free (NSF for short) graph is one in which every vertex is contained in a triangle except pendant vertices and isolated vertices. This class has been considered before for several contexts. In the present paper, we study the complexity of the dominating induced matching (DIM) problem and the perfect edge domination (PED) problem for NSF graphs. We prove the corresponding decision problems are NP-Complete for several of its subclasses. As an added value of this study, we have shown three connected variants of planar positive 1in3SAT are also NP-Complete. Since these variants are more basic in complexity theory context than many graph problems, these results can be useful to prove that other problems are NP-Complete. △ Less

Submitted 7 April, 2024; v1 submitted 25 April, 2022; originally announced April 2022.

MSC Class: 05C70; 05C85; 68R07; 68R10; 68Q25 ACM Class: G.2.2; F.2.m

arXiv:2110.02411 [pdf, other]

Voice Aging with Audio-Visual Style Transfer

Authors: Justin Wilson, Sunyeong Park, Seunghye J. Wilson, Ming C. Lin

Abstract: Face aging techniques have used generative adversarial networks (GANs) and style transfer learning to transform one's appearance to look younger/older. Identity is maintained by conditioning these generative networks on a learned vector representation of the source content. In this work, we apply a similar approach to age a speaker's voice, referred to as voice aging. We first analyze the classifi… ▽ More Face aging techniques have used generative adversarial networks (GANs) and style transfer learning to transform one's appearance to look younger/older. Identity is maintained by conditioning these generative networks on a learned vector representation of the source content. In this work, we apply a similar approach to age a speaker's voice, referred to as voice aging. We first analyze the classification of a speaker's age by training a convolutional neural network (CNN) on the speaker's voice and face data from Common Voice and VoxCeleb datasets. We generate aged voices from style transfer to transform an input spectrogram to various ages and demonstrate our method on a mobile app. △ Less

Submitted 5 October, 2021; originally announced October 2021.

arXiv:2110.02405 [pdf, other]

Echo-Reconstruction: Audio-Augmented 3D Scene Reconstruction

Authors: Justin Wilson, Nicholas Rewkowski, Ming C. Lin, Henry Fuchs

Abstract: Reflective and textureless surfaces such as windows, mirrors, and walls can be a challenge for object and scene reconstruction. These surfaces are often poorly reconstructed and filled with depth discontinuities and holes, making it difficult to cohesively reconstruct scenes that contain these planar discontinuities. We propose Echoreconstruction, an audio-visual method that uses the reflections o… ▽ More Reflective and textureless surfaces such as windows, mirrors, and walls can be a challenge for object and scene reconstruction. These surfaces are often poorly reconstructed and filled with depth discontinuities and holes, making it difficult to cohesively reconstruct scenes that contain these planar discontinuities. We propose Echoreconstruction, an audio-visual method that uses the reflections of sound to aid in geometry and audio reconstruction for virtual conferencing, teleimmersion, and other AR/VR experience. The mobile phone prototype emits pulsed audio, while recording video for RGB-based 3D reconstruction and audio-visual classification. Reflected sound and images from the video are input into our audio (EchoCNN-A) and audio-visual (EchoCNN-AV) convolutional neural networks for surface and sound source detection, depth estimation, and material classification. The inferences from these classifications enhance scene 3D reconstructions containing open spaces and reflective surfaces by depth filtering, inpainting, and placement of unmixed sound sources in the scene. Our prototype, VR demo, and experimental results from real-world and virtual scenes with challenging surfaces and sound indicate high success rates on classification of material, depth estimation, and closed/open surfaces, leading to considerable visual and audio improvement in 3D scenes (see Figure 1). △ Less

Submitted 5 October, 2021; originally announced October 2021.

arXiv:2110.02404 [pdf, other]

3D-MOV: Audio-Visual LSTM Autoencoder for 3D Reconstruction of Multiple Objects from Video

Authors: Justin Wilson, Ming C. Lin

Abstract: 3D object reconstructions of transparent and concave structured objects, with inferred material properties, remains an open research problem for robot navigation in unstructured environments. In this paper, we propose a multimodal single- and multi-frame neural network for 3D reconstructions using audio-visual inputs. Our trained reconstruction LSTM autoencoder 3D-MOV accepts multiple inputs to ac… ▽ More 3D object reconstructions of transparent and concave structured objects, with inferred material properties, remains an open research problem for robot navigation in unstructured environments. In this paper, we propose a multimodal single- and multi-frame neural network for 3D reconstructions using audio-visual inputs. Our trained reconstruction LSTM autoencoder 3D-MOV accepts multiple inputs to account for a variety of surface types and views. Our neural network produces high-quality 3D reconstructions using voxel representation. Based on Intersection-over-Union (IoU), we evaluate against other baseline methods using synthetic audio-visual datasets ShapeNet and Sound20K with impact sounds and bounding box annotations. To the best of our knowledge, our single- and multi-frame model is the first audio-visual reconstruction neural network for 3D geometry and material representation. △ Less

Submitted 5 October, 2021; originally announced October 2021.

arXiv:2109.07719 [pdf, other]

Efficient Differentiable Simulation of Articulated Bodies

Authors: Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, Ming C. Lin

Abstract: We present a method for efficient differentiable simulation of articulated bodies. This enables integration of articulated body dynamics into deep learning frameworks, and gradient-based optimization of neural networks that operate on articulated bodies. We derive the gradients of the forward dynamics using spatial algebra and the adjoint method. Our approach is an order of magnitude faster than a… ▽ More We present a method for efficient differentiable simulation of articulated bodies. This enables integration of articulated body dynamics into deep learning frameworks, and gradient-based optimization of neural networks that operate on articulated bodies. We derive the gradients of the forward dynamics using spatial algebra and the adjoint method. Our approach is an order of magnitude faster than autodiff tools. By only saving the initial states throughout the simulation process, our method reduces memory requirements by two orders of magnitude. We demonstrate the utility of efficient differentiable dynamics for articulated bodies in a variety of applications. We show that reinforcement learning with articulated systems can be accelerated using gradients provided by our method. In applications to control and inverse problems, gradient-based optimization enabled by our work accelerates convergence by more than an order of magnitude. △ Less

Submitted 16 September, 2021; originally announced September 2021.

Comments: ICML 2021

arXiv:2102.13262 [pdf, other]

Improving Robustness of Learning-based Autonomous Steering Using Adversarial Images

Authors: Yu Shen, Laura Zheng, Manli Shu, Weizi Li, Tom Goldstein, Ming C. Lin

Abstract: For safety of autonomous driving, vehicles need to be able to drive under various lighting, weather, and visibility conditions in different environments. These external and environmental factors, along with internal factors associated with sensors, can pose significant challenges to perceptual data processing, hence affecting the decision-making and control of the vehicle. In this work, we address… ▽ More For safety of autonomous driving, vehicles need to be able to drive under various lighting, weather, and visibility conditions in different environments. These external and environmental factors, along with internal factors associated with sensors, can pose significant challenges to perceptual data processing, hence affecting the decision-making and control of the vehicle. In this work, we address this critical issue by introducing a framework for analyzing robustness of the learning algorithm w.r.t varying quality in the image input for autonomous driving. Using the results of sensitivity analysis, we further propose an algorithm to improve the overall performance of the task of "learning to steer". The results show that our approach is able to enhance the learning outcomes up to 48%. A comparative study drawn between our approach and other related techniques, such as data augmentation and adversarial training, confirms the effectiveness of our algorithm as a way to improve the robustness and generalization of neural network training for autonomous driving. △ Less

Submitted 25 February, 2021; originally announced February 2021.

arXiv:2008.07436 [pdf, other]

Multi-Agent Coverage in Urban Environments

Authors: Shivang Patel, Senthil Hariharan, Pranav Dhulipala, Ming C Lin, Dinesh Manocha, Huan Xu, Michael Otte

Abstract: We study multi-agent coverage algorithms for autonomous monitoring and patrol in urban environments. We consider scenarios in which a team of flying agents uses downward facing cameras (or similar sensors) to observe the environment outside of buildings at street-level. Buildings are considered obstacles that impede movement, and cameras are assumed to be ineffective above a maximum altitude. We s… ▽ More We study multi-agent coverage algorithms for autonomous monitoring and patrol in urban environments. We consider scenarios in which a team of flying agents uses downward facing cameras (or similar sensors) to observe the environment outside of buildings at street-level. Buildings are considered obstacles that impede movement, and cameras are assumed to be ineffective above a maximum altitude. We study multi-agent urban coverage problems related to this scenario, including: (1) static multi-agent urban coverage, in which agents are expected to observe the environment from static locations, and (2) dynamic multi-agent urban coverage where agents move continuously through the environment. We experimentally evaluate six different multi-agent coverage methods, including: three types of ergodic coverage (that avoid buildings in different ways), lawn-mower sweep, voronoi region based control, and a naive grid method. We evaluate all algorithms with respect to four performance metrics (percent coverage, revist count, revist time, and the integral of area viewed over time), across four types of urban environments [low density, high density] x [short buildings, tall buildings], and for team sizes ranging from 2 to 25 agents. We believe this is the first extensive comparison of these methods in an urban setting. Our results highlight how the relative performance of static and dynamic methods changes based on the ratio of team size to search area, as well the relative effects that different characteristics of urban environments (tall, short, dense, sparse, mixed) have on each algorithm. △ Less

Submitted 17 August, 2020; originally announced August 2020.

arXiv:2007.02168 [pdf, other]

Scalable Differentiable Physics for Learning and Control

Authors: Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, Ming C. Lin

Abstract: Differentiable physics is a powerful approach to learning and control problems that involve physical objects and environments. While notable progress has been made, the capabilities of differentiable physics solvers remain limited. We develop a scalable framework for differentiable physics that can support a large number of objects and their interactions. To accommodate objects with arbitrary geom… ▽ More Differentiable physics is a powerful approach to learning and control problems that involve physical objects and environments. While notable progress has been made, the capabilities of differentiable physics solvers remain limited. We develop a scalable framework for differentiable physics that can support a large number of objects and their interactions. To accommodate objects with arbitrary geometry and topology, we adopt meshes as our representation and leverage the sparsity of contacts for scalable differentiable collision handling. Collisions are resolved in localized regions to minimize the number of optimization variables even when the number of simulated objects is high. We further accelerate implicit differentiation of optimization with nonlinear constraints. Experiments demonstrate that the presented framework requires up to two orders of magnitude less memory and computation in comparison to recent particle-based methods. We further validate the approach on inverse problems and control scenarios, where it outperforms derivative-free and model-free baselines by at least an order of magnitude. △ Less

Submitted 4 July, 2020; originally announced July 2020.

Journal ref: Proceedings of the 37th International Conference on Machine Learning, ICML 2020

arXiv:1908.09464 [pdf, other]

Shape-Aware Human Pose and Shape Reconstruction Using Multi-View Images

Authors: Junbang Liang, Ming C. Lin

Abstract: We propose a scalable neural network framework to reconstruct the 3D mesh of a human body from multi-view images, in the subspace of the SMPL model. Use of multi-view images can significantly reduce the projection ambiguity of the problem, increasing the reconstruction accuracy of the 3D human body under clothing. Our experiments show that this method benefits from the synthetic dataset generated… ▽ More We propose a scalable neural network framework to reconstruct the 3D mesh of a human body from multi-view images, in the subspace of the SMPL model. Use of multi-view images can significantly reduce the projection ambiguity of the problem, increasing the reconstruction accuracy of the 3D human body under clothing. Our experiments show that this method benefits from the synthetic dataset generated from our pipeline since it has good flexibility of variable control and can provide ground-truth for validation. Our method outperforms existing methods on real-world images, especially on shape estimations. △ Less

Submitted 26 August, 2019; originally announced August 2019.

Comments: To be published to ICCV 2019

arXiv:1907.08874 [pdf, other]

ADAPS: Autonomous Driving Via Principled Simulations

Authors: Weizi Li, David Wolinski, Ming C. Lin

Abstract: Autonomous driving has gained significant advancements in recent years. However, obtaining a robust control policy for driving remains challenging as it requires training data from a variety of scenarios, including rare situations (e.g., accidents), an effective policy architecture, and an efficient learning mechanism. We propose ADAPS for producing robust control policies for autonomous vehicles.… ▽ More Autonomous driving has gained significant advancements in recent years. However, obtaining a robust control policy for driving remains challenging as it requires training data from a variety of scenarios, including rare situations (e.g., accidents), an effective policy architecture, and an efficient learning mechanism. We propose ADAPS for producing robust control policies for autonomous vehicles. ADAPS consists of two simulation platforms in generating and analyzing accidents to automatically produce labeled training data, and a memory-enabled hierarchical control policy. Additionally, ADAPS offers a more efficient online learning mechanism that reduces the number of iterations required in learning compared to existing methods such as DAGGER. We present both theoretical and experimental results. The latter are produced in simulated environments, where qualitative and quantitative results are generated to demonstrate the benefits of ADAPS. △ Less

Submitted 20 July, 2019; originally announced July 2019.

Comments: Accepted to ICRA2019

arXiv:1902.08379 [pdf, other]

LSwarm: Efficient Collision Avoidance for Large Swarms with Coverage Constraints in Complex Urban Scenes

Authors: Senthil Hariharan Arul, Adarsh Jagan Sathyamoorthy, Shivang Patel, Michael Otte, Huan Xu, Ming C Lin, Dinesh Manocha

Abstract: In this paper, we address the problem of collision avoidance for a swarm of UAVs used for continuous surveillance of an urban environment. Our method, LSwarm, efficiently avoids collisions with static obstacles, dynamic obstacles and other agents in 3-D urban environments while considering coverage constraints. LSwarm computes collision avoiding velocities that (i) maximize the conformity of an ag… ▽ More In this paper, we address the problem of collision avoidance for a swarm of UAVs used for continuous surveillance of an urban environment. Our method, LSwarm, efficiently avoids collisions with static obstacles, dynamic obstacles and other agents in 3-D urban environments while considering coverage constraints. LSwarm computes collision avoiding velocities that (i) maximize the conformity of an agent to an optimal path given by a global coverage strategy and (ii) ensure sufficient resolution of the coverage data collected by each agent. Our algorithm is formulated based on ORCA (Optimal Reciprocal Collision Avoidance) and is scalable with respect to the size of the swarm. We evaluate the coverage performance of LSwarm in realistic simulations of a swarm of quadrotors in complex urban models. In practice, our approach can compute collision avoiding velocities for a swarm composed of tens to hundreds of agents in a few milliseconds on dense urban scenes consisting of tens of buildings. △ Less

Submitted 26 May, 2019; v1 submitted 22 February, 2019; originally announced February 2019.

Comments: 11 pages

arXiv:1810.12295 [pdf]

Estimating Traffic Conditions At Metropolitan Scale Using Traffic Flow Theory

Authors: Weizi Li, Meilei Jiang, Yaoyu Chen, Ming C. Lin

Abstract: The rapid urbanization and increasing traffic have serious social, economic, and environmental impact on metropolitan areas worldwide. It is of a great importance to understand the complex interplay of road networks and traffic conditions. The authors propose a novel framework to estimate traffic conditions at the metropolitan scale using GPS traces. Their approach begins with an initial estimatio… ▽ More The rapid urbanization and increasing traffic have serious social, economic, and environmental impact on metropolitan areas worldwide. It is of a great importance to understand the complex interplay of road networks and traffic conditions. The authors propose a novel framework to estimate traffic conditions at the metropolitan scale using GPS traces. Their approach begins with an initial estimation of network travel times by solving a convex optimization program based on traffic flow theory. Then, they iteratively refine the estimated network travel times and vehicle traversed paths. Last, the authors perform a bilevel optimization process to estimate traffic conditions on road segments that are not covered by GPS data. The evaluation and comparison of the authors' approach over two state-of-the-art methods show up to 96.57% relative improvements. The authors have further conducted field tests by coupling road networks of San Francisco and Beijing with real-world GIS data, which involve 128,701 nodes, 148,899 road segments, and over 26 million GPS traces. △ Less

Submitted 29 October, 2018; originally announced October 2018.

Comments: Accepted in Transportation Research Board 97th Annual Meeting, 2018

arXiv:1705.08379 [pdf, ps, other]

Perfect Edge Domination: Hard and Solvable Cases

Authors: Min Chih Lin, Vadim Lozin, Veronica A. Moyano, Jayme L. Szwarcfiter

Abstract: Let $G$ be an undirected graph. An edge of $G$ dominates itself and all edges adjacent to it. A subset $E'$ of edges of $G$ is an edge dominating set of $G$, if every edge of the graph is dominated by some edge of $E'$. We say that $E'$ is a perfect edge dominating set of $G$, if every edge not in $E'$ is dominated by exactly one edge of $E'$. The perfect edge dominating problem is to determine a… ▽ More Let $G$ be an undirected graph. An edge of $G$ dominates itself and all edges adjacent to it. A subset $E'$ of edges of $G$ is an edge dominating set of $G$, if every edge of the graph is dominated by some edge of $E'$. We say that $E'$ is a perfect edge dominating set of $G$, if every edge not in $E'$ is dominated by exactly one edge of $E'$. The perfect edge dominating problem is to determine a least cardinality perfect edge dominating set of $G$. For this problem, we describe two NP-completeness proofs, for the classes of claw-free graphs of degree at most 3, and for bounded degree graphs, of maximum degree at most $d \geq 3$ and large girth. In contrast, we prove that the problem admits an $O(n)$ time solution, for cubic claw-free graphs. In addition, we prove a complexity dichotomy theorem for the perfect edge domination problem, based on the results described in the paper. Finally, we describe a linear time algorithm for finding a minimum weight perfect edge dominating set of a $P_5$-free graph. The algorithm is robust, in the sense that, given an arbitrary graph $G$, either it computes a minimum weight perfect edge dominating set of $G$, or it exhibits an induced subgraph of $G$, isomorphic to a $P_5$. △ Less

Submitted 23 May, 2017; originally announced May 2017.

Showing 1–50 of 57 results for author: Lin, M C