-
Scene Graph-Guided Generative AI Framework for Synthesizing and Evaluating Industrial Hazard Scenarios
Authors:
Sanjay Acharjee,
Abir Khan Ratul,
Diego Patino,
Md Nazmus Sakib
Abstract:
Training vision models to detect workplace hazards accurately requires realistic images of unsafe conditions that could lead to accidents. However, acquiring such datasets is difficult because capturing accident-triggering scenarios as they occur is nearly impossible. To overcome this limitation, this study presents a novel scene graph-guided generative AI framework that synthesizes photorealistic…
▽ More
Training vision models to detect workplace hazards accurately requires realistic images of unsafe conditions that could lead to accidents. However, acquiring such datasets is difficult because capturing accident-triggering scenarios as they occur is nearly impossible. To overcome this limitation, this study presents a novel scene graph-guided generative AI framework that synthesizes photorealistic images of hazardous scenarios grounded in historical Occupational Safety and Health Administration (OSHA) accident reports. OSHA narratives are analyzed using GPT-4o to extract structured hazard reasoning, which is converted into object-level scene graphs capturing spatial and contextual relationships essential for understanding risk. These graphs guide a text-to-image diffusion model to generate compositionally accurate hazard scenes. To evaluate the realism and semantic fidelity of the generated data, a visual question answering (VQA) framework is introduced. Across four state-of-the-art generative models, the proposed VQA Graph Score outperforms CLIP and BLIP metrics based on entropy-based validation, confirming its higher discriminative sensitivity.
△ Less
Submitted 17 November, 2025;
originally announced November 2025.
-
Sketch2BIM: A Multi-Agent Human-AI Collaborative Pipeline to Convert Hand-Drawn Floor Plans to 3D BIM
Authors:
Abir Khan Ratul,
Sanjay Acharjee,
Somin Park,
Md Nazmus Sakib
Abstract:
This study introduces a human-in-the-loop pipeline that converts unscaled, hand-drawn floor plan sketches into semantically consistent 3D BIM models. The workflow leverages multimodal large language models (MLLMs) within a multi-agent framework, combining perceptual extraction, human feedback, schema validation, and automated BIM scripting. Initially, sketches are iteratively refined into a struct…
▽ More
This study introduces a human-in-the-loop pipeline that converts unscaled, hand-drawn floor plan sketches into semantically consistent 3D BIM models. The workflow leverages multimodal large language models (MLLMs) within a multi-agent framework, combining perceptual extraction, human feedback, schema validation, and automated BIM scripting. Initially, sketches are iteratively refined into a structured JSON layout of walls, doors, and windows. Later, these layouts are transformed into executable scripts that generate 3D BIM models. Experiments on ten diverse floor plans demonstrate strong convergence: openings (doors, windows) are captured with high reliability in the initial pass, while wall detection begins around 83% and achieves near-perfect alignment after a few feedback iterations. Across all categories, precision, recall, and F1 scores remain above 0.83, and geometric errors (RMSE, MAE) progressively decrease to zero through feedback corrections. This study demonstrates how MLLM-driven multi-agent reasoning can make BIM creation accessible to both experts and non-experts using only freehand sketches.
△ Less
Submitted 16 October, 2025;
originally announced October 2025.
-
Does Language Model Understand Language?
Authors:
Suvojit Acharjee,
Utathya Aich,
Asfak Ali
Abstract:
Despite advances in natural language generation and understanding, LM still struggle with fine grained linguistic phenomena such as tense, negation, voice, and modality which are the elements central to effective human communication. In the context of the United Nations SDG 4, where linguistic clarity is critical, the deployment of LMs in educational technologies demands careful scrutiny. As LMs a…
▽ More
Despite advances in natural language generation and understanding, LM still struggle with fine grained linguistic phenomena such as tense, negation, voice, and modality which are the elements central to effective human communication. In the context of the United Nations SDG 4, where linguistic clarity is critical, the deployment of LMs in educational technologies demands careful scrutiny. As LMs are increasingly powering applications like tutoring systems, automated grading, and translation, their alignment with human linguistic interpretation becomes essential for effective learning. In this study, we conduct a evaluation of SOTA language models across these challenging contexts in both English and Bengali. To ensure a structured assessment, we introduce a new Route for Evaluation of Cognitive Inference in Systematic Environments guidelines. Our proposed LUCID dataset, composed of carefully crafted sentence pairs in English and Bengali, specifically challenges these models on critical aspects of language comprehension, including negation, tense, voice variations. We assess the performance of SOTA models including MISTRAL-SABA-24B, LLaMA-4-Scout-17B, LLaMA-3.3-70B, Gemma2-9B, and Compound-Beta using standard metrics like Pearson correlation, Spearman correlation, and Mean Absolute Error, as well as novel, linguistically inspired metric the HCE accuracy. The HCE accuracy measures how often model predictions fall within one standard deviation of the mean human rating, thus capturing human like tolerance for variability in language interpretation. Our findings highlight Compound-Beta as the most balanced model, consistently achieving high correlations and low MAEs across diverse language conditions. It records the highest Pearson correlation in English and demonstrates robust performance on mixed-language data, indicating a strong alignment with human judgments in cross lingual scenarios.
△ Less
Submitted 15 September, 2025;
originally announced September 2025.
-
Big data searching using words
Authors:
Santanu Acharjee,
Ripunjoy Choudhury
Abstract:
Big data analytics is one of the most promising areas of new research and development in computer science, enterprises, e-commerce, and defense. For many organizations, big data is considered one of their most important strategic assets. This explosive growth has made it necessary to develop effective techniques for examining and analyzing big data from mathematical perspectives. Among various met…
▽ More
Big data analytics is one of the most promising areas of new research and development in computer science, enterprises, e-commerce, and defense. For many organizations, big data is considered one of their most important strategic assets. This explosive growth has made it necessary to develop effective techniques for examining and analyzing big data from mathematical perspectives. Among various methods of analyzing big data, topological data analysis (TDA) is now considered one of the useful tools. However, there is no fundamental concept related to the topological structure in big data. In this paper, we present fundamental concepts related to the neighborhood structures of words in big data search, laying the groundwork for developing topological frameworks for big data in the future. We also introduce the notion of big data primal within the context of big data search and explore how neighborhood structures, combined with the Jaccard similarity coefficient, can be utilized to detect anomalies in search behavior.
△ Less
Submitted 15 July, 2025; v1 submitted 10 September, 2024;
originally announced September 2024.
-
Trust levels in social networks
Authors:
Santanu Acharjee,
Akhil Thomas Panicker
Abstract:
In this paper, we study trust levels in social networks from the perspective of Dunbar's number.
In this paper, we study trust levels in social networks from the perspective of Dunbar's number.
△ Less
Submitted 4 March, 2022;
originally announced March 2022.
-
OptM3Sec: Optimizing Multicast IRS-Aided Multiantenna DFRC Secrecy Channel with Multiple Eavesdroppers
Authors:
Kumar Vijay Mishra,
Arpan Chattopadhyay,
Siddharth Sankar Acharjee,
Athina P. Petropulu
Abstract:
With the use of common signaling methods for dual-function radar-communications (DFRC) systems, the susceptibility of eavesdropping on messages aimed at legitimate users has worsened. For DFRC systems, the radar target may act as an eavesdropper (ED) that receives a high-energy signal thereby leading to additional challenges. Unlike prior works, we consider a multicast multi-antenna DFRC system wi…
▽ More
With the use of common signaling methods for dual-function radar-communications (DFRC) systems, the susceptibility of eavesdropping on messages aimed at legitimate users has worsened. For DFRC systems, the radar target may act as an eavesdropper (ED) that receives a high-energy signal thereby leading to additional challenges. Unlike prior works, we consider a multicast multi-antenna DFRC system with multiple EDs. We then propose a physical layer design approach to maximize the secrecy rate by installing intelligent reflecting surfaces in the radar channels. Our optimization of multiple ED multicast multi-antenna DFRC secrecy rate (OptM3Sec) approach solves this highly nonconvex problem with respect to the precoding matrices. Our numerical experiments demonstrate the feasibility of our algorithm in maximizing the secrecy rate in this DFRC setup.
△ Less
Submitted 23 January, 2022;
originally announced January 2022.
-
Lower Bounds for Shoreline Searching with 2 or More Robots
Authors:
Sumi Acharjee,
Konstantinos Georgiou,
Somnath Kundu,
Akshaya Srinivasan
Abstract:
Searching for a line on the plane with $n$ unit speed robots is a classic online problem that dates back to the 50's, and for which competitive ratio upper bounds are known for every $n\geq 1$. In this work we improve the best lower bound known for $n=2$ robots from 1.5993 to 3. Moreover we prove that the competitive ratio is at least $\sqrt{3}$ for $n=3$ robots, and at least $1/\cos(π/n)$ for…
▽ More
Searching for a line on the plane with $n$ unit speed robots is a classic online problem that dates back to the 50's, and for which competitive ratio upper bounds are known for every $n\geq 1$. In this work we improve the best lower bound known for $n=2$ robots from 1.5993 to 3. Moreover we prove that the competitive ratio is at least $\sqrt{3}$ for $n=3$ robots, and at least $1/\cos(π/n)$ for $n\geq 4$ robots. Our lower bounds match the best upper bounds known for $n\geq 4$, hence resolving these cases. To the best of our knowledge, these are the first lower bounds proven for the cases $n\geq 3$ of this several decades old problem.
△ Less
Submitted 13 January, 2020;
originally announced January 2020.
-
Multilevel Threshold Based Gray Scale Image Segmentation using Cuckoo Search
Authors:
Sourav Samantaa,
Nilanjan Dey,
Poulami Das,
Suvojit Acharjee,
Sheli Sinha Chaudhuri
Abstract:
Image Segmentation is a technique of partitioning the original image into some distinct classes. Many possible solutions may be available for segmenting an image into a certain number of classes, each one having different quality of segmentation. In our proposed method, multilevel thresholding technique has been used for image segmentation. A new approach of Cuckoo Search (CS) is used for selectio…
▽ More
Image Segmentation is a technique of partitioning the original image into some distinct classes. Many possible solutions may be available for segmenting an image into a certain number of classes, each one having different quality of segmentation. In our proposed method, multilevel thresholding technique has been used for image segmentation. A new approach of Cuckoo Search (CS) is used for selection of optimal threshold value. In other words, the algorithm is used to achieve the best solution from the initial random threshold values or solutions and to evaluate the quality of a solution correlation function is used. Finally, MSE and PSNR are measured to understand the segmentation quality.
△ Less
Submitted 1 July, 2013;
originally announced July 2013.
-
Medical Information Embedding in Compressed Watermarked Intravascular Ultrasound Video
Authors:
Nilanjan Dey,
Suvojit Acharjee,
Debalina Biswas,
Achintya Das,
Sheli Sinha Chaudhuri
Abstract:
In medical field, intravascular ultrasound (IVUS) is a tomographic imaging modality, which can identify the boundaries of different layers of blood vessels. IVUS can detect myocardial infarction (heart attack) that remains ignored and unattended when only angioplasty is done. During the past decade, it became easier for some individuals or groups to copy and transmits digital information without t…
▽ More
In medical field, intravascular ultrasound (IVUS) is a tomographic imaging modality, which can identify the boundaries of different layers of blood vessels. IVUS can detect myocardial infarction (heart attack) that remains ignored and unattended when only angioplasty is done. During the past decade, it became easier for some individuals or groups to copy and transmits digital information without the permission of the owner. For increasing authentication and security of copyrights, digital watermarking, an information hiding technique, was introduced. Achieving watermarking technique with lesser amount of distortion in biomedical data is a challenging task. Watermark can be embedded into an image or in a video. As video data is a huge amount of information, therefore a large storage area is needed which is not feasible. In this case motion vector based video compression is done to reduce size. In this present paper, an Electronic Patient Record (EPR) is embedded as watermark within an IVUS video and then motion vector is calculated. This proposed method proves robustness as the extracted watermark has good PSNR value and less MSE.
△ Less
Submitted 9 March, 2013;
originally announced March 2013.