-
CornOrb: A Multimodal Dataset of Orbscan Corneal Topography and Clinical Annotations for Keratoconus Detection
Authors:
Mohammed El Amine Lazouni,
Leila Ryma Lazouni,
Zineb Aziza Elaouaber,
Mohammed Ammar,
Sofiane Zehar,
Mohammed Youcef Bouayad Agha,
Ahmed Lazouni,
Amel Feroui,
Ali H. Al-Timemy,
Siamak Yousefi,
Mostafa El Habib Daho
Abstract:
In this paper, we present CornOrb, a publicly accessible multimodal dataset of Orbscan corneal topography images and clinical annotations collected from patients in Algeria. The dataset comprises 1,454 eyes from 744 patients, including 889 normal eyes and 565 keratoconus cases. For each eye, four corneal maps are provided (axial curvature, anterior elevation, posterior elevation, and pachymetry),…
▽ More
In this paper, we present CornOrb, a publicly accessible multimodal dataset of Orbscan corneal topography images and clinical annotations collected from patients in Algeria. The dataset comprises 1,454 eyes from 744 patients, including 889 normal eyes and 565 keratoconus cases. For each eye, four corneal maps are provided (axial curvature, anterior elevation, posterior elevation, and pachymetry), together with structured tabular data including demographic information and key clinical parameters such as astigmatism, maximum keratometry (Kmax), central and thinnest pachymetry, and anterior/posterior asphericity.
All data were retrospectively acquired, fully anonymized, and pre-processed into standardized PNG and CSV formats to ensure direct usability for artificial intelligence research. This dataset represents one of the first large-scale Orbscan-based resources from Africa, specifically built to enable robust AI-driven detection and analysis of keratoconus using multimodal data. The data are openly available at Zenodo.
△ Less
Submitted 22 March, 2026;
originally announced March 2026.
-
Noise-Aware Misclassification Attack Detection in Collaborative DNN Inference
Authors:
Shima Yousefi,
Saptarshi Debroy
Abstract:
Collaborative inference of object classification Deep neural Networks (DNNs) where resource-constrained end-devices offload partially processed data to remote edge servers to complete end-to-end processing, is becoming a key enabler of edge-AI. However, such edge-offloading is vulnerable to malicious data injections leading to stealthy misclassifications that are tricky to detect, especially in th…
▽ More
Collaborative inference of object classification Deep neural Networks (DNNs) where resource-constrained end-devices offload partially processed data to remote edge servers to complete end-to-end processing, is becoming a key enabler of edge-AI. However, such edge-offloading is vulnerable to malicious data injections leading to stealthy misclassifications that are tricky to detect, especially in the presence of environmental noise. In this paper, we propose a semi-gray-box and noise- aware anomaly detection framework fueled by a variational autoencoder (VAE) to capture deviations caused by adversarial manipulation. The proposed framework incorporates a robust noise-aware feature that captures the characteristic behavior of environmental noise to improve detection accuracy while reducing false alarm rates. Our evaluation with popular object classification DNNs demonstrate the robustness of the proposed detection (up to 90% AUROC across DNN configurations) under realistic noisy conditions while revealing limitations caused by feature similarity and elevated noise levels.
△ Less
Submitted 18 March, 2026;
originally announced March 2026.
-
Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning
Authors:
Martina G. Vilas,
Safoora Yousefi,
Besmira Nushi,
Eric Horvitz,
Vidhisha Balachandran
Abstract:
Reasoning models improve their problem-solving ability through inference-time scaling, allocating more compute via longer token budgets. Identifying which reasoning traces are likely to succeed remains a key opportunity: reliably predicting productive paths can substantially reduce wasted computation and improve overall efficiency. We introduce Latent-Trajectory signals that characterize the tempo…
▽ More
Reasoning models improve their problem-solving ability through inference-time scaling, allocating more compute via longer token budgets. Identifying which reasoning traces are likely to succeed remains a key opportunity: reliably predicting productive paths can substantially reduce wasted computation and improve overall efficiency. We introduce Latent-Trajectory signals that characterize the temporal evolution of a model's internal representations during the generation of intermediate reasoning tokens. By measuring the overall change in latent representations between the start and end of reasoning, the change accumulated across intermediate steps, and the extent to which these changes advance toward the final state, we show that these signals predict solution accuracy more reliably than both cross-layer metrics and output-based confidence measures. When used to guide answer selection across multiple sampled generations, Latent-Trajectory signals make test-time scaling more effective and efficient than majority voting, reducing token usage by up to 70% while preserving and even improving accuracy by 2.6% on average. Moreover, these predictive signals often emerge early in the reasoning trace, enabling early selection and allocation of compute to the most promising candidates. Our findings contribute not only practical strategies for inference-time efficiency, but also a deeper interpretability perspective on how reasoning processes are represented and differentiated in latent space.
△ Less
Submitted 12 October, 2025;
originally announced October 2025.
-
Detection of Misreporting Attacks on Software-Defined Immersive Environments
Authors:
Sourya Saha,
Md Nurul Absur,
Shima Yousefi,
Saptarshi Debroy
Abstract:
The ability to centrally control network infrastructure using a programmable middleware has made Software-Defined Networking (SDN) ideal for emerging applications, such as immersive environments. However, such flexibility introduces new vulnerabilities, such as switch misreporting led load imbalance, which in turn make such immersive environment vulnerable to severe quality degradation. In this pa…
▽ More
The ability to centrally control network infrastructure using a programmable middleware has made Software-Defined Networking (SDN) ideal for emerging applications, such as immersive environments. However, such flexibility introduces new vulnerabilities, such as switch misreporting led load imbalance, which in turn make such immersive environment vulnerable to severe quality degradation. In this paper, we present a hybrid machine learning (ML)-based network anomaly detection framework that identifies such stealthy misreporting by capturing temporal inconsistencies in switch-reported loads, and thereby counter potentially catastrophic quality degradation of hosted immersive application. The detection system combines unsupervised anomaly scoring with supervised classification to robustly distinguish malicious behavior. Data collected from a realistic testbed deployment under both benign and adversarial conditions is used to train and evaluate the model. Experimental results show that the framework achieves high recall in detecting misreporting behavior, making it effective for early and reliable detection in SDN environments.
△ Less
Submitted 22 September, 2025;
originally announced September 2025.
-
AdVAR-DNN: Adversarial Misclassification Attack on Collaborative DNN Inference
Authors:
Shima Yousefi,
Motahare Mounesan,
Saptarshi Debroy
Abstract:
In recent years, Deep Neural Networks (DNNs) have become increasingly integral to IoT-based environments, enabling realtime visual computing. However, the limited computational capacity of these devices has motivated the adoption of collaborative DNN inference, where the IoT device offloads part of the inference-related computation to a remote server. Such offloading often requires dynamic DNN par…
▽ More
In recent years, Deep Neural Networks (DNNs) have become increasingly integral to IoT-based environments, enabling realtime visual computing. However, the limited computational capacity of these devices has motivated the adoption of collaborative DNN inference, where the IoT device offloads part of the inference-related computation to a remote server. Such offloading often requires dynamic DNN partitioning information to be exchanged among the participants over an unsecured network or via relays/hops, leading to novel privacy vulnerabilities. In this paper, we propose AdVAR-DNN, an adversarial variational autoencoder (VAE)-based misclassification attack, leveraging classifiers to detect model information and a VAE to generate untraceable manipulated samples, specifically designed to compromise the collaborative inference process. AdVAR-DNN attack uses the sensitive information exchange vulnerability of collaborative DNN inference and is black-box in nature in terms of having no prior knowledge about the DNN model and how it is partitioned. Our evaluation using the most popular object classification DNNs on the CIFAR-100 dataset demonstrates the effectiveness of AdVAR-DNN in terms of high attack success rate with little to no probability of detection.
△ Less
Submitted 1 August, 2025;
originally announced August 2025.
-
Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks
Authors:
Shakir Yousefi,
Andreas Plesner,
Till Aczel,
Roger Wattenhofer
Abstract:
Modern neural networks demonstrate state-of-the-art performance on numerous existing benchmarks; however, their high computational requirements and energy consumption prompt researchers to seek more efficient solutions for real-world deployment. Logic gate networks (LGNs) learns a large network of logic gates for efficient image classification. However, learning a network that can solve a simple p…
▽ More
Modern neural networks demonstrate state-of-the-art performance on numerous existing benchmarks; however, their high computational requirements and energy consumption prompt researchers to seek more efficient solutions for real-world deployment. Logic gate networks (LGNs) learns a large network of logic gates for efficient image classification. However, learning a network that can solve a simple problem like CIFAR-10 can take days to weeks to train. Even then, almost half of the network remains unused, causing a discretization gap. This discretization gap hinders real-world deployment of LGNs, as the performance drop between training and inference negatively impacts accuracy. We inject Gumbel noise with a straight-through estimator during training to significantly speed up training, improve neuron utilization, and decrease the discretization gap. We theoretically show that this results from implicit Hessian regularization, which improves the convergence properties of LGNs. We train networks $4.5 \times$ faster in wall-clock time, reduce the discretization gap by $98\%$, and reduce the number of unused gates by $100\%$.
△ Less
Submitted 30 October, 2025; v1 submitted 9 June, 2025;
originally announced June 2025.
-
Phi-4-reasoning Technical Report
Authors:
Marah Abdin,
Sahaj Agarwal,
Ahmed Awadallah,
Vidhisha Balachandran,
Harkirat Behl,
Lingjiao Chen,
Gustavo de Rosa,
Suriya Gunasekar,
Mojan Javaheripi,
Neel Joshi,
Piero Kauffmann,
Yash Lara,
Caio César Teodoro Mendes,
Arindam Mitra,
Besmira Nushi,
Dimitris Papailiopoulos,
Olli Saarikivi,
Shital Shah,
Vaishnavi Shrivastava,
Vibhav Vineet,
Yue Wu,
Safoora Yousefi,
Guoqing Zheng
Abstract:
We introduce Phi-4-reasoning, a 14-billion parameter reasoning model that achieves strong performance on complex reasoning tasks. Trained via supervised fine-tuning of Phi-4 on carefully curated set of "teachable" prompts-selected for the right level of complexity and diversity-and reasoning demonstrations generated using o3-mini, Phi-4-reasoning generates detailed reasoning chains that effectivel…
▽ More
We introduce Phi-4-reasoning, a 14-billion parameter reasoning model that achieves strong performance on complex reasoning tasks. Trained via supervised fine-tuning of Phi-4 on carefully curated set of "teachable" prompts-selected for the right level of complexity and diversity-and reasoning demonstrations generated using o3-mini, Phi-4-reasoning generates detailed reasoning chains that effectively leverage inference-time compute. We further develop Phi-4-reasoning-plus, a variant enhanced through a short phase of outcome-based reinforcement learning that offers higher performance by generating longer reasoning traces. Across a wide range of reasoning tasks, both models outperform significantly larger open-weight models such as DeepSeek-R1-Distill-Llama-70B model and approach the performance levels of full DeepSeek-R1 model. Our comprehensive evaluations span benchmarks in math and scientific reasoning, coding, algorithmic problem solving, planning, and spatial understanding. Interestingly, we observe a non-trivial transfer of improvements to general-purpose benchmarks as well. In this report, we provide insights into our training data, our training methodologies, and our evaluations. We show that the benefit of careful data curation for supervised fine-tuning (SFT) extends to reasoning language models, and can be further amplified by reinforcement learning (RL). Finally, our evaluation points to opportunities for improving how we assess the performance and robustness of reasoning models.
△ Less
Submitted 30 April, 2025;
originally announced April 2025.
-
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead
Authors:
Vidhisha Balachandran,
Jingya Chen,
Lingjiao Chen,
Shivam Garg,
Neel Joshi,
Yash Lara,
John Langford,
Besmira Nushi,
Vibhav Vineet,
Yue Wu,
Safoora Yousefi
Abstract:
Inference-time scaling can enhance the reasoning capabilities of large language models (LLMs) on complex problems that benefit from step-by-step problem solving. Although lengthening generated scratchpads has proven effective for mathematical tasks, the broader impact of this approach on other tasks remains less clear. In this work, we investigate the benefits and limitations of scaling methods ac…
▽ More
Inference-time scaling can enhance the reasoning capabilities of large language models (LLMs) on complex problems that benefit from step-by-step problem solving. Although lengthening generated scratchpads has proven effective for mathematical tasks, the broader impact of this approach on other tasks remains less clear. In this work, we investigate the benefits and limitations of scaling methods across nine state-of-the-art models and eight challenging tasks, including math and STEM reasoning, calendar planning, NP-hard problems, navigation, and spatial reasoning. We compare conventional models (e.g., GPT-4o) with models fine-tuned for inference-time scaling (e.g., o1) through evaluation protocols that involve repeated model calls, either independently or sequentially with feedback. These evaluations approximate lower and upper performance bounds and potential for future performance improvements for each model, whether through enhanced training or multi-model inference systems. Our extensive empirical analysis reveals that the advantages of inference-time scaling vary across tasks and diminish as problem complexity increases. In addition, simply using more tokens does not necessarily translate to higher accuracy in these challenging regimes. Results from multiple independent runs with conventional models using perfect verifiers show that, for some tasks, these models can achieve performance close to the average performance of today's most advanced reasoning models. However, for other tasks, a significant performance gap remains, even in very high scaling regimes. Encouragingly, all models demonstrate significant gains when inference is further scaled with perfect verifiers or strong feedback, suggesting ample potential for future improvements.
△ Less
Submitted 31 March, 2025;
originally announced April 2025.
-
Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models
Authors:
Mazda Moayeri,
Vidhisha Balachandran,
Varun Chandrasekaran,
Safoora Yousefi,
Thomas Fel,
Soheil Feizi,
Besmira Nushi,
Neel Joshi,
Vibhav Vineet
Abstract:
With models getting stronger, evaluations have grown more complex, testing multiple skills in one benchmark and even in the same instance at once. However, skill-wise performance is obscured when inspecting aggregate accuracy, under-utilizing the rich signal modern benchmarks contain. We propose an automatic approach to recover the underlying skills relevant for any evaluation instance, by way of…
▽ More
With models getting stronger, evaluations have grown more complex, testing multiple skills in one benchmark and even in the same instance at once. However, skill-wise performance is obscured when inspecting aggregate accuracy, under-utilizing the rich signal modern benchmarks contain. We propose an automatic approach to recover the underlying skills relevant for any evaluation instance, by way of inspecting model-generated rationales. After validating the relevance of rationale-parsed skills and inferring skills for $46$k instances over $12$ benchmarks, we observe many skills to be common across benchmarks, resulting in the curation of hundreds of skill-slices (i.e. sets of instances testing a common skill). Inspecting accuracy over these slices yields novel insights on model trade-offs: e.g., compared to GPT-4o and Claude 3.5 Sonnet, on average, Gemini 1.5 Pro is $18\%$ more accurate in "computing molar mass", but $19\%$ less accurate in "applying constitutional law", despite the overall accuracies of the three models differing by a mere $0.4\%$. Furthermore, we demonstrate the practical utility of our approach by showing that insights derived from skill slice analysis can generalize to held-out instances: when routing each instance to the model strongest on the relevant skills, we see a $3\%$ accuracy improvement over our $12$ dataset corpus. Our skill-slices and framework open a new avenue in model evaluation, leveraging skill-specific analyses to unlock a more granular and actionable understanding of model capabilities.
△ Less
Submitted 24 October, 2024; v1 submitted 17 October, 2024;
originally announced October 2024.
-
Improving Instruction-Following in Language Models through Activation Steering
Authors:
Alessandro Stolfo,
Vidhisha Balachandran,
Safoora Yousefi,
Eric Horvitz,
Besmira Nushi
Abstract:
The ability to follow instructions is crucial for numerous real-world applications of language models. In pursuit of deeper insights and more powerful capabilities, we derive instruction-specific vector representations from language models and use them to steer models accordingly. These vectors are computed as the difference in activations between inputs with and without instructions, enabling a m…
▽ More
The ability to follow instructions is crucial for numerous real-world applications of language models. In pursuit of deeper insights and more powerful capabilities, we derive instruction-specific vector representations from language models and use them to steer models accordingly. These vectors are computed as the difference in activations between inputs with and without instructions, enabling a modular approach to activation steering. We demonstrate how this method can enhance model adherence to constraints such as output format, length, and word inclusion, providing inference-time control over instruction following. Our experiments across four models demonstrate how we can use the activation vectors to guide models to follow constraints even without explicit instructions and to enhance performance when instructions are present. Additionally, we explore the compositionality of activation steering, successfully applying multiple instructions simultaneously. Finally, we demonstrate that steering vectors computed on instruction-tuned models can transfer to improve base models. Our findings demonstrate that activation steering offers a practical and scalable approach for fine-grained control in language generation. Our code and data are available at https://github.com/microsoft/llm-steer-instruct.
△ Less
Submitted 14 April, 2025; v1 submitted 15 October, 2024;
originally announced October 2024.
-
Eureka: Evaluating and Understanding Large Foundation Models
Authors:
Vidhisha Balachandran,
Jingya Chen,
Neel Joshi,
Besmira Nushi,
Hamid Palangi,
Eduardo Salinas,
Vibhav Vineet,
James Woffinden-Luey,
Safoora Yousefi
Abstract:
Rigorous and reproducible evaluation is critical for assessing the state of the art and for guiding scientific advances in Artificial Intelligence. Evaluation is challenging in practice due to several reasons, including benchmark saturation, lack of transparency in methods used for measurement, development challenges in extracting measurements for generative tasks, and, more generally, the extensi…
▽ More
Rigorous and reproducible evaluation is critical for assessing the state of the art and for guiding scientific advances in Artificial Intelligence. Evaluation is challenging in practice due to several reasons, including benchmark saturation, lack of transparency in methods used for measurement, development challenges in extracting measurements for generative tasks, and, more generally, the extensive number of capabilities required for a well-rounded comparison across models. We make three contributions to alleviate the above challenges. First, we present Eureka, an open-source framework for standardizing evaluations of large foundation models beyond single-score reporting and rankings. Second, we introduce Eureka-Bench as an extensible collection of benchmarks testing capabilities that (i) are still challenging for state-of-the-art models and (ii) represent fundamental but overlooked language and multimodal capabilities. The inherent space for improvement in non-saturated benchmarks enables us to discover meaningful differences between models at a capability level. Third, using Eureka, we conduct an analysis of 12 state-of-the-art models, providing in-depth insights into failure understanding and model comparison, which can be leveraged to plan targeted improvements. In contrast to recent trends in reports and leaderboards showing absolute rankings and claims for one model or another to be the best, our analysis shows that there is no such best model. Different models have different strengths, but there are models that appear more often than others as best performers for some capabilities. Despite the recent improvements, current models still struggle with several fundamental capabilities including detailed image understanding, benefiting from multimodal input when available rather than fully relying on language, factuality and grounding for information retrieval, and over refusals.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Multi-Stream TSN Gate Control Scheduling in the Presence of Clock Synchronization
Authors:
Aviroop Ghosh,
Saleh Yousefi,
Thomas Kunz
Abstract:
With the advancement of technologies like Industry 4.0, communication networks must meet stringent requirements of applications demanding deterministic and bounded latencies. The problem is further compounded by the need to periodically synchronize network devices to a common time reference to address clock drifts. Existing solutions often simplify the problem by assuming either perfect synchroniz…
▽ More
With the advancement of technologies like Industry 4.0, communication networks must meet stringent requirements of applications demanding deterministic and bounded latencies. The problem is further compounded by the need to periodically synchronize network devices to a common time reference to address clock drifts. Existing solutions often simplify the problem by assuming either perfect synchronization or a worst-case error. Additionally, these approaches delay the scheduling process in network devices until the scheduled frame is guaranteed to have arrived in the device queue, inducing additional delays to the stream. A novel approach that completely avoids queuing delays is proposed, enabling it to meet even the strictest deadline requirement. Furthermore, both approaches can be enhanced by incorporating network-derived time-synchronization information. This is not only convenient for meeting deadline requirements but also improves bandwidth efficiency.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
A Comprehensive Review of Artificial Intelligence Applications in Major Retinal Conditions
Authors:
Hina Raja,
Taimur Hassan,
Bilal Hassan,
Muhammad Usman Akram,
Hira Raja,
Alaa A Abd-alrazaq,
Siamak Yousefi,
Naoufel Werghi
Abstract:
This paper provides a systematic survey of retinal diseases that cause visual impairments or blindness, emphasizing the importance of early detection for effective treatment. It covers both clinical and automated approaches for detecting retinal disease, focusing on studies from the past decade. The survey evaluates various algorithms for identifying structural abnormalities and diagnosing retinal…
▽ More
This paper provides a systematic survey of retinal diseases that cause visual impairments or blindness, emphasizing the importance of early detection for effective treatment. It covers both clinical and automated approaches for detecting retinal disease, focusing on studies from the past decade. The survey evaluates various algorithms for identifying structural abnormalities and diagnosing retinal diseases, and it identifies future research directions based on a critical analysis of existing literature. This comprehensive study, which reviews both clinical and automated detection methods using different modalities, appears to be unique in its scope. Additionally, the survey serves as a helpful guide for researchers interested in digital retinopathy.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency
Authors:
Azhar Shaikh,
Michael Cochez,
Denis Diachkov,
Michiel de Rijcke,
Sahar Yousefi
Abstract:
This paper introduces DONUT-hole, a sparse OCR-free visual document understanding (VDU) model that addresses the limitations of its predecessor model, dubbed DONUT. The DONUT model, leveraging a transformer architecture, overcoming the challenges of separate optical character recognition (OCR) and visual semantic understanding (VSU) components. However, its deployment in production environments an…
▽ More
This paper introduces DONUT-hole, a sparse OCR-free visual document understanding (VDU) model that addresses the limitations of its predecessor model, dubbed DONUT. The DONUT model, leveraging a transformer architecture, overcoming the challenges of separate optical character recognition (OCR) and visual semantic understanding (VSU) components. However, its deployment in production environments and edge devices is hindered by high memory and computational demands, particularly in large-scale request services. To overcome these challenges, we propose an optimization strategy based on knowledge distillation and model pruning. Our paradigm to produce DONUT-hole, reduces the model denisty by 54\% while preserving performance. We also achieve a global representational similarity index between DONUT and DONUT-hole based on centered kernel alignment (CKA) metric of 0.79. Moreover, we evaluate the effectiveness of DONUT-hole in the document image key information extraction (KIE) task, highlighting its potential for developing more efficient VDU systems for logistic companies.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models
Authors:
Safoora Yousefi,
Leo Betthauser,
Hosein Hasanbeig,
Raphaël Millière,
Ida Momennejad
Abstract:
Large language models (LLMs) exhibit remarkable performance improvement through in-context learning (ICL) by leveraging task-specific examples in the input. However, the mechanisms behind this improvement remain elusive. In this work, we investigate how LLM embeddings and attention representations change following in-context-learning, and how these changes mediate improvement in behavior. We emplo…
▽ More
Large language models (LLMs) exhibit remarkable performance improvement through in-context learning (ICL) by leveraging task-specific examples in the input. However, the mechanisms behind this improvement remain elusive. In this work, we investigate how LLM embeddings and attention representations change following in-context-learning, and how these changes mediate improvement in behavior. We employ neuroscience-inspired techniques such as representational similarity analysis (RSA) and propose novel methods for parameterized probing and measuring ratio of attention to relevant vs. irrelevant information in Llama-2 70B and Vicuna 13B. We designed two tasks with a priori relationships among their conditions: linear regression and reading comprehension. We formed hypotheses about expected similarities in task representations and measured hypothesis alignment of LLM representations before and after ICL as well as changes in attention. Our analyses revealed a meaningful correlation between improvements in behavior after ICL and changes in both embeddings and attention weights across LLM layers. This empirical framework empowers a nuanced understanding of how latent representations shape LLM behavior, offering valuable tools and insights for future research and practical applications.
△ Less
Submitted 21 February, 2024; v1 submitted 30 September, 2023;
originally announced October 2023.
-
Identifying factors associated with fast visual field progression in patients with ocular hypertension based on unsupervised machine learning
Authors:
Xiaoqin Huang,
Asma Poursoroush,
Jian Sun,
Michael V. Boland,
Chris Johnson,
Siamak Yousefi
Abstract:
Purpose: To identify ocular hypertension (OHT) subtypes with different trends of visual field (VF) progression based on unsupervised machine learning and to discover factors associated with fast VF progression. Participants: A total of 3133 eyes of 1568 ocular hypertension treatment study (OHTS) participants with at least five follow-up VF tests were included in the study. Methods: We used a laten…
▽ More
Purpose: To identify ocular hypertension (OHT) subtypes with different trends of visual field (VF) progression based on unsupervised machine learning and to discover factors associated with fast VF progression. Participants: A total of 3133 eyes of 1568 ocular hypertension treatment study (OHTS) participants with at least five follow-up VF tests were included in the study. Methods: We used a latent class mixed model (LCMM) to identify OHT subtypes using standard automated perimetry (SAP) mean deviation (MD) trajectories. We characterized the subtypes based on demographic, clinical, ocular, and VF factors at the baseline. We then identified factors driving fast VF progression using generalized estimating equation (GEE) and justified findings qualitatively and quantitatively. Results: The LCMM model discovered four clusters (subtypes) of eyes with different trajectories of MD worsening. The number of eyes in clusters were 794 (25%), 1675 (54%), 531 (17%) and 133 (4%). We labelled the clusters as Improvers, Stables, Slow progressors, and Fast progressors based on their mean of MD decline, which were 0.08, -0.06, -0.21, and -0.45 dB/year, respectively. Eyes with fast VF progression had higher baseline age, intraocular pressure (IOP), pattern standard deviation (PSD) and refractive error (RE), but lower central corneal thickness (CCT). Fast progression was associated with calcium channel blockers, being male, heart disease history, diabetes history, African American race, stroke history, and migraine headaches.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
ChatGPT Assisting Diagnosis of Neuro-ophthalmology Diseases Based on Case Reports
Authors:
Yeganeh Madadi,
Mohammad Delsoz,
Priscilla A. Lao,
Joseph W. Fong,
TJ Hollingsworth,
Malik Y. Kahook,
Siamak Yousefi
Abstract:
Objective: To evaluate the efficiency of large language models (LLMs) such as ChatGPT to assist in diagnosing neuro-ophthalmic diseases based on detailed case descriptions. Methods: We selected 22 different case reports of neuro-ophthalmic diseases from a publicly available online database. These cases included a wide range of chronic and acute diseases that are commonly seen by neuro-ophthalmic s…
▽ More
Objective: To evaluate the efficiency of large language models (LLMs) such as ChatGPT to assist in diagnosing neuro-ophthalmic diseases based on detailed case descriptions. Methods: We selected 22 different case reports of neuro-ophthalmic diseases from a publicly available online database. These cases included a wide range of chronic and acute diseases that are commonly seen by neuro-ophthalmic sub-specialists. We inserted the text from each case as a new prompt into both ChatGPT v3.5 and ChatGPT Plus v4.0 and asked for the most probable diagnosis. We then presented the exact information to two neuro-ophthalmologists and recorded their diagnoses followed by comparison to responses from both versions of ChatGPT. Results: ChatGPT v3.5, ChatGPT Plus v4.0, and the two neuro-ophthalmologists were correct in 13 (59%), 18 (82%), 19 (86%), and 19 (86%) out of 22 cases, respectively. The agreement between the various diagnostic sources were as follows: ChatGPT v3.5 and ChatGPT Plus v4.0, 13 (59%); ChatGPT v3.5 and the first neuro-ophthalmologist, 12 (55%); ChatGPT v3.5 and the second neuro-ophthalmologist, 12 (55%); ChatGPT Plus v4.0 and the first neuro-ophthalmologist, 17 (77%); ChatGPT Plus v4.0 and the second neuro-ophthalmologist, 16 (73%); and first and second neuro-ophthalmologists 17 (17%). Conclusions: The accuracy of ChatGPT v3.5 and ChatGPT Plus v4.0 in diagnosing patients with neuro-ophthalmic diseases was 59% and 82%, respectively. With further development, ChatGPT Plus v4.0 may have potential to be used in clinical care settings to assist clinicians in providing quick, accurate diagnoses of patients in neuro-ophthalmology. The applicability of using LLMs like ChatGPT in clinical settings that lack access to subspeciality trained neuro-ophthalmologists deserves further research.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Streaming Active Learning with Deep Neural Networks
Authors:
Akanksha Saran,
Safoora Yousefi,
Akshay Krishnamurthy,
John Langford,
Jordan T. Ash
Abstract:
Active learning is perhaps most naturally posed as an online learning problem. However, prior active learning approaches with deep neural networks assume offline access to the entire dataset ahead of time. This paper proposes VeSSAL, a new algorithm for batch active learning with deep neural networks in streaming settings, which samples groups of points to query for labels at the moment they are e…
▽ More
Active learning is perhaps most naturally posed as an online learning problem. However, prior active learning approaches with deep neural networks assume offline access to the entire dataset ahead of time. This paper proposes VeSSAL, a new algorithm for batch active learning with deep neural networks in streaming settings, which samples groups of points to query for labels at the moment they are encountered. Our approach trades off between uncertainty and diversity of queried samples to match a desired query rate without requiring any hand-tuned hyperparameters. Altogether, we expand the applicability of deep neural networks to realistic active learning scenarios, such as applications relevant to HCI and large, fractured datasets.
△ Less
Submitted 6 June, 2023; v1 submitted 4 March, 2023;
originally announced March 2023.
-
An information-theoretic branch-and-prune algorithm for discrete phase optimization of RIS in massive MIMO
Authors:
I. Zakir Ahmed,
Hamid R. Sadjadpour,
Shahram Yousefi
Abstract:
In this paper, we consider passive RIS-assisted multi-user communication between wireless nodes to improve the blocked line-of-sight (LOS) link performance. The wireless nodes are assumed to be equipped with Massive Multiple-Input Multiple-Output antennas, hybrid precoder, combiner, and low-resolution analog-to-digital converters (ADCs). We first derive the expression for the Cramer-Rao lower boun…
▽ More
In this paper, we consider passive RIS-assisted multi-user communication between wireless nodes to improve the blocked line-of-sight (LOS) link performance. The wireless nodes are assumed to be equipped with Massive Multiple-Input Multiple-Output antennas, hybrid precoder, combiner, and low-resolution analog-to-digital converters (ADCs). We first derive the expression for the Cramer-Rao lower bound (CRLB) of the Mean Squared Error (MSE) of the received and combined signal at the intended receiver under interference. By appropriate design of the hybrid precoder, combiner, and RIS phase settings, it can be shown that the MSE achieves the CRLB. We further show that minimizing the MSE w.r.t. the phase settings of the RIS is equivalent to maximizing the throughput and energy efficiency of the system. We then propose a novel Information-Directed Branch-and-Prune (IDBP) algorithm to derive the phase settings of the RIS. We, for the first time in the literature, use an information-theoretic measure to decide on the pruning rules in a tree-search algorithm to arrive at the RIS phase-setting solution, which is vastly different compared to the traditional branch-and-bound algorithm that uses bounds of the cost function to define the pruning rules. In addition, we provide the theoretical guarantees of the near-optimality of the RIS phase-setting solution thus obtained using the Asymptotic Equipartition property. This also ensures near-optimal throughput and MSE performance.
△ Less
Submitted 15 January, 2023;
originally announced January 2023.
-
The Hitchiker's Guide to Successful Living Lab Operations
Authors:
Alan Wang,
Feng Yi Chang,
Siavash Yousefi,
Beatrice Li,
Brad Campbell,
Arsalan Heydarian
Abstract:
Living labs have been established across different countries to evaluate how the interaction between humans and buildings can be optimized to improve comfort, health, and energy savings. However, existing living labs can be too project-specific, not scalable, and inflexible for comparison against other labs. Furthermore, the lack of transparency in its software infrastructure inhibits opportunitie…
▽ More
Living labs have been established across different countries to evaluate how the interaction between humans and buildings can be optimized to improve comfort, health, and energy savings. However, existing living labs can be too project-specific, not scalable, and inflexible for comparison against other labs. Furthermore, the lack of transparency in its software infrastructure inhibits opportunities for critique and reuse, reducing the platform's overall potential. In the face of climate change and global energy shortage, we envision the future of living labs to be open source and scalable to support the integration of different IoTs, subjective measures, human-building interactions, security, and privacy contexts. In this work, we share our living lab software stack and present our experience developing a platform that supports qualitative and quantitative experiments from the ground up. We propose the first open-source interoperable living lab platform for multidisciplinary smart environment research.
△ Less
Submitted 20 November, 2022;
originally announced December 2022.
-
Detecting Elevated Air Pollution Levels by Monitoring Web Search Queries: Deep Learning-Based Time Series Forecasting
Authors:
Chen Lin,
Safoora Yousefi,
Elvis Kahoro,
Payam Karisani,
Donghai Liang,
Jeremy Sarnat,
Eugene Agichtein
Abstract:
Real-time air pollution monitoring is a valuable tool for public health and environmental surveillance. In recent years, there has been a dramatic increase in air pollution forecasting and monitoring research using artificial neural networks (ANNs). Most of the prior work relied on modeling pollutant concentrations collected from ground-based monitors and meteorological data for long-term forecast…
▽ More
Real-time air pollution monitoring is a valuable tool for public health and environmental surveillance. In recent years, there has been a dramatic increase in air pollution forecasting and monitoring research using artificial neural networks (ANNs). Most of the prior work relied on modeling pollutant concentrations collected from ground-based monitors and meteorological data for long-term forecasting of outdoor ozone, oxides of nitrogen, and PM2.5. Given that traditional, highly sophisticated air quality monitors are expensive and are not universally available, these models cannot adequately serve those not living near pollutant monitoring sites. Furthermore, because prior models were built on physical measurement data collected from sensors, they may not be suitable for predicting public health effects experienced from pollution exposure. This study aims to develop and validate models to nowcast the observed pollution levels using Web search data, which is publicly available in near real-time from major search engines. We developed novel machine learning-based models using both traditional supervised classification methods and state-of-the-art deep learning methods to detect elevated air pollution levels at the US city level, by using generally available meteorological data and aggregate Web-based search volume data derived from Google Trends. We validated the performance of these methods by predicting three critical air pollutants (ozone (O3), nitrogen dioxide (NO2), and fine particulate matter (PM2.5)), across ten major U.S. metropolitan statistical areas (MSAs) in 2017 and 2018.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Stacking Ensemble Learning in Deep Domain Adaptation for Ophthalmic Image Classification
Authors:
Yeganeh Madadi,
Vahid Seydi,
Jian Sun,
Edward Chaum,
Siamak Yousefi
Abstract:
Domain adaptation is an attractive approach given the availability of a large amount of labeled data with similar properties but different domains. It is effective in image classification tasks where obtaining sufficient label data is challenging. We propose a novel method, named SELDA, for stacking ensemble learning via extending three domain adaptation methods for effectively solving real-world…
▽ More
Domain adaptation is an attractive approach given the availability of a large amount of labeled data with similar properties but different domains. It is effective in image classification tasks where obtaining sufficient label data is challenging. We propose a novel method, named SELDA, for stacking ensemble learning via extending three domain adaptation methods for effectively solving real-world problems. The major assumption is that when base domain adaptation models are combined, we can obtain a more accurate and robust model by exploiting the ability of each of the base models. We extend Maximum Mean Discrepancy (MMD), Low-rank coding, and Correlation Alignment (CORAL) to compute the adaptation loss in three base models. Also, we utilize a two-fully connected layer network as a meta-model to stack the output predictions of these three well-performing domain adaptation models to obtain high accuracy in ophthalmic image classification tasks. The experimental results using Age-Related Eye Disease Study (AREDS) benchmark ophthalmic dataset demonstrate the effectiveness of the proposed model.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
Early Discovery of Emerging Entities in Persian Twitter with Semantic Similarity
Authors:
Shahin Yousefi,
Mohsen Hooshmand,
Mohsen Afsharchi
Abstract:
Discovering emerging entities (EEs) is the problem of finding entities before their establishment. These entities can be critical for individuals, companies, and governments. Many of these entities can be discovered on social media platforms, e.g. Twitter. These identities have been the spot of research in academia and industry in recent years. Similar to any machine learning problem, data availab…
▽ More
Discovering emerging entities (EEs) is the problem of finding entities before their establishment. These entities can be critical for individuals, companies, and governments. Many of these entities can be discovered on social media platforms, e.g. Twitter. These identities have been the spot of research in academia and industry in recent years. Similar to any machine learning problem, data availability is one of the major challenges in this problem. This paper proposes EEPT. That is an online clustering method able to discover EEs without any need for training on a dataset. Additionally, due to the lack of a proper evaluation metric, this paper uses a new metric to evaluate the results. The results show that EEPT is promising and finds significant entities before their establishment.
△ Less
Submitted 7 June, 2023; v1 submitted 6 July, 2022;
originally announced July 2022.
-
DASP: A Framework for Driving the Adoption of Software Security Practices
Authors:
Enrique Larios-Vargas,
Omar Elazhary,
Soroush Yousefi,
Derek Lowlind,
Michael L. W. Vliek,
Margaret-Anne Storey
Abstract:
Implementing software security practices is a critical concern in modern software development. Industry practitioners, security tool providers, and researchers have provided standard security guidelines and sophisticated security development tools to ensure a secure software development pipeline. But despite these efforts, there continues to be an increase in the number of vulnerabilities that can…
▽ More
Implementing software security practices is a critical concern in modern software development. Industry practitioners, security tool providers, and researchers have provided standard security guidelines and sophisticated security development tools to ensure a secure software development pipeline. But despite these efforts, there continues to be an increase in the number of vulnerabilities that can be exploited by malicious hackers. There is thus an urgent need to understand why developers still introduce security vulnerabilities into their applications and to understand what can be done to motivate them to write more secure code. To understand and address this problem further, we propose DASP, a framework for diagnosing and driving the adoption of software security practices among developers. DASP was conceived by combining behavioral science theories to shape a cross-sectional interview study with 28 software practitioners. Our interviews lead to a framework that consists of a comprehensive set of 33 drivers grouped into 7 higher-level categories that represent what needs to happen or change so that the adoption of software security practices occurs. Using the DASP framework, organizations can design interventions suitable for developers' specific development contexts that will motivate them to write more secure code.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Constrained Resource Allocation Problems in Communications: An Information-assisted Approach
Authors:
I. Zakir Ahmed,
Hamid Sadjadpour,
Shahram Yousefi
Abstract:
We consider a class of resource allocation problems given a set of unconditional constraints whose objective function satisfies Bellman's optimality principle. Such problems are ubiquitous in wireless communication, signal processing, and networking. These constrained combinatorial optimization problems are, in general, NP-Hard. This paper proposes two algorithms to solve this class of problems us…
▽ More
We consider a class of resource allocation problems given a set of unconditional constraints whose objective function satisfies Bellman's optimality principle. Such problems are ubiquitous in wireless communication, signal processing, and networking. These constrained combinatorial optimization problems are, in general, NP-Hard. This paper proposes two algorithms to solve this class of problems using a dynamic programming framework assisted by an information-theoretic measure. We demonstrate that the proposed algorithms ensure optimal solutions under carefully chosen conditions and use significantly reduced computational resources. We substantiate our claims by solving the power-constrained bit allocation problem in 5G massive Multiple-Input Multiple-Output receivers using the proposed approach.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Joint Registration and Segmentation via Multi-Task Learning for Adaptive Radiotherapy of Prostate Cancer
Authors:
Mohamed S. Elmahdy,
Laurens Beljaards,
Sahar Yousefi,
Hessam Sokooti,
Fons Verbeek,
U. A. van der Heide,
Marius Staring
Abstract:
Medical image registration and segmentation are two of the most frequent tasks in medical image analysis. As these tasks are complementary and correlated, it would be beneficial to apply them simultaneously in a joint manner. In this paper, we formulate registration and segmentation as a joint problem via a Multi-Task Learning (MTL) setting, allowing these tasks to leverage their strengths and mit…
▽ More
Medical image registration and segmentation are two of the most frequent tasks in medical image analysis. As these tasks are complementary and correlated, it would be beneficial to apply them simultaneously in a joint manner. In this paper, we formulate registration and segmentation as a joint problem via a Multi-Task Learning (MTL) setting, allowing these tasks to leverage their strengths and mitigate their weaknesses through the sharing of beneficial information. We propose to merge these tasks not only on the loss level, but on the architectural level as well. We studied this approach in the context of adaptive image-guided radiotherapy for prostate cancer, where planning and follow-up CT images as well as their corresponding contours are available for training. The study involves two datasets from different manufacturers and institutes. The first dataset was divided into training (12 patients) and validation (6 patients), and was used to optimize and validate the methodology, while the second dataset (14 patients) was used as an independent test set. We carried out an extensive quantitative comparison between the quality of the automatically generated contours from different network architectures as well as loss weighting methods. Moreover, we evaluated the quality of the generated deformation vector field (DVF). We show that MTL algorithms outperform their Single-Task Learning (STL) counterparts and achieve better generalization on the independent test set. The best algorithm achieved a mean surface distance of $1.06 \pm 0.3$ mm, $1.27 \pm 0.4$ mm, $0.91 \pm 0.4$ mm, and $1.76 \pm 0.8$ mm on the validation set for the prostate, seminal vesicles, bladder, and rectum, respectively. The high accuracy of the proposed method combined with the fast inference speed, makes it a promising method for automatic re-contouring of follow-up scans for adaptive radiotherapy.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
An Optimal Low-Complexity Energy-Efficient ADC Bit Allocation for Massive MIMO
Authors:
I. Zakir Ahmed,
Hamid Sadjadpour,
Shahram Yousefi
Abstract:
Fixed low-resolution Analog to Digital Converters (ADC) help reduce the power consumption in millimeter-wave Massive Multiple-Input Multiple-Output (Ma-MIMO) receivers operating at large bandwidths. However, they do not guarantee optimal Energy Efficiency (EE). It has been shown that adopting variable-resolution (VR) ADCs in Ma-MIMO receivers can improve performance with Mean Squared Error (MSE) a…
▽ More
Fixed low-resolution Analog to Digital Converters (ADC) help reduce the power consumption in millimeter-wave Massive Multiple-Input Multiple-Output (Ma-MIMO) receivers operating at large bandwidths. However, they do not guarantee optimal Energy Efficiency (EE). It has been shown that adopting variable-resolution (VR) ADCs in Ma-MIMO receivers can improve performance with Mean Squared Error (MSE) and throughput while providing better EE. In this paper, we present an optimal energy-efficient bit allocation (BA) algorithm for Ma-MIMO receivers equipped with VR ADCs under a power constraint. We derive an expression for EE as a function of the Cramer-Rao Lower Bound on the MSE of the received, combined, and quantized signal. An optimal BA condition is derived by maximizing EE under a power constraint. We show that the optimal BA thus obtained is exactly the same as that obtained using the brute-force BA with a significant reduction in computational complexity. We also study the EE performance and computational complexity of a heuristic algorithm that yields a near-optimal solution.
△ Less
Submitted 11 April, 2021;
originally announced April 2021.
-
ASL to PET Translation by a Semi-supervised Residual-based Attention-guided Convolutional Neural Network
Authors:
Sahar Yousefi,
Hessam Sokooti,
Wouter M. Teeuwisse,
Dennis F. R. Heijtel,
Aart J. Nederveen,
Marius Staring,
Matthias J. P. van Osch
Abstract:
Positron Emission Tomography (PET) is an imaging method that can assess physiological function rather than structural disturbances by measuring cerebral perfusion or glucose consumption. However, this imaging technique relies on injection of radioactive tracers and is expensive. On the contrary, Arterial Spin Labeling (ASL) MRI is a non-invasive, non-radioactive, and relatively cheap imaging techn…
▽ More
Positron Emission Tomography (PET) is an imaging method that can assess physiological function rather than structural disturbances by measuring cerebral perfusion or glucose consumption. However, this imaging technique relies on injection of radioactive tracers and is expensive. On the contrary, Arterial Spin Labeling (ASL) MRI is a non-invasive, non-radioactive, and relatively cheap imaging technique for brain hemodynamic measurements, which allows quantification to some extent. In this paper we propose a convolutional neural network (CNN) based model for translating ASL to PET images, which could benefit patients as well as the healthcare system in terms of expenses and adverse side effects. However, acquiring a sufficient number of paired ASL-PET scans for training a CNN is prohibitive for many reasons. To tackle this problem, we present a new semi-supervised multitask CNN which is trained on both paired data, i.e. ASL and PET scans, and unpaired data, i.e. only ASL scans, which alleviates the problem of training a network on limited paired data. Moreover, we present a new residual-based-attention guided mechanism to improve the contextual features during the training process. Also, we show that incorporating T1-weighted scans as an input, due to its high resolution and availability of anatomical information, improves the results. We performed a two-stage evaluation based on quantitative image metrics by conducting a 7-fold cross validation followed by a double-blind observer study. The proposed network achieved structural similarity index measure (SSIM), mean squared error (MSE) and peak signal-to-noise ratio (PSNR) values of $0.85\pm0.08$, $0.01\pm0.01$, and $21.8\pm4.5$ respectively, for translating from 2D ASL and T1-weighted images to PET data. The proposed model is publicly available via https://github.com/yousefis/ASL2PET.
△ Less
Submitted 8 March, 2021;
originally announced March 2021.
-
Esophageal Tumor Segmentation in CT Images using Dilated Dense Attention Unet (DDAUnet)
Authors:
Sahar Yousefi,
Hessam Sokooti,
Mohamed S. Elmahdy,
Irene M. Lips,
Mohammad T. Manzuri Shalmani,
Roel T. Zinkstok,
Frank J. W. M. Dankers,
Marius Staring
Abstract:
Manual or automatic delineation of the esophageal tumor in CT images is known to be very challenging. This is due to the low contrast between the tumor and adjacent tissues, the anatomical variation of the esophagus, as well as the occasional presence of foreign bodies (e.g. feeding tubes). Physicians therefore usually exploit additional knowledge such as endoscopic findings, clinical history, add…
▽ More
Manual or automatic delineation of the esophageal tumor in CT images is known to be very challenging. This is due to the low contrast between the tumor and adjacent tissues, the anatomical variation of the esophagus, as well as the occasional presence of foreign bodies (e.g. feeding tubes). Physicians therefore usually exploit additional knowledge such as endoscopic findings, clinical history, additional imaging modalities like PET scans. Achieving his additional information is time-consuming, while the results are error-prone and might lead to non-deterministic results. In this paper we aim to investigate if and to what extent a simplified clinical workflow based on CT alone, allows one to automatically segment the esophageal tumor with sufficient quality. For this purpose, we present a fully automatic end-to-end esophageal tumor segmentation method based on convolutional neural networks (CNNs). The proposed network, called Dilated Dense Attention Unet (DDAUnet), leverages spatial and channel attention gates in each dense block to selectively concentrate on determinant feature maps and regions. Dilated convolutional layers are used to manage GPU memory and increase the network receptive field. We collected a dataset of 792 scans from 288 distinct patients including varying anatomies with \mbox{air pockets}, feeding tubes and proximal tumors. Repeatability and reproducibility studies were conducted for three distinct splits of training and validation sets. The proposed network achieved a $\mathrm{DSC}$ value of $0.79 \pm 0.20$, a mean surface distance of $5.4 \pm 20.2mm$ and $95\%$ Hausdorff distance of $14.7 \pm 25.0mm$ for 287 test scans, demonstrating promising results with a simplified clinical workflow based on CT alone. Our code is publicly available via \url{https://github.com/yousefis/DenseUnet_Esophagus_Segmentation}.
△ Less
Submitted 24 March, 2021; v1 submitted 6 December, 2020;
originally announced December 2020.
-
An Adaptive Intelligence Algorithm for Undersampled Knee MRI Reconstruction
Authors:
Nicola Pezzotti,
Sahar Yousefi,
Mohamed S. Elmahdy,
Jeroen van Gemert,
Christophe Schülke,
Mariya Doneva,
Tim Nielsen,
Sergey Kastryulin,
Boudewijn P. F. Lelieveldt,
Matthias J. P. van Osch,
Elwin de Weerdt,
Marius Staring
Abstract:
Adaptive intelligence aims at empowering machine learning techniques with the additional use of domain knowledge. In this work, we present the application of adaptive intelligence to accelerate MR acquisition. Starting from undersampled k-space data, an iterative learning-based reconstruction scheme inspired by compressed sensing theory is used to reconstruct the images. We adopt deep neural netwo…
▽ More
Adaptive intelligence aims at empowering machine learning techniques with the additional use of domain knowledge. In this work, we present the application of adaptive intelligence to accelerate MR acquisition. Starting from undersampled k-space data, an iterative learning-based reconstruction scheme inspired by compressed sensing theory is used to reconstruct the images. We adopt deep neural networks to refine and correct prior reconstruction assumptions given the training data. The network was trained and tested on a knee MRI dataset from the 2019 fastMRI challenge organized by Facebook AI Research and NYU Langone Health. All submissions to the challenge were initially ranked based on similarity with a known groundtruth, after which the top 4 submissions were evaluated radiologically. Our method was evaluated by the fastMRI organizers on an independent challenge dataset. It ranked #1, shared #1, and #3 on respectively the 8x accelerated multi-coil, the 4x multi-coil, and the 4x single-coil track. This demonstrates the superior performance and wide applicability of the method.
△ Less
Submitted 27 October, 2020; v1 submitted 15 April, 2020;
originally announced April 2020.
-
3D Convolutional Neural Networks Image Registration Based on Efficient Supervised Learning from Artificial Deformations
Authors:
Hessam Sokooti,
Bob de Vos,
Floris Berendsen,
Mohsen Ghafoorian,
Sahar Yousefi,
Boudewijn P. F. Lelieveldt,
Ivana Isgum,
Marius Staring
Abstract:
We propose a supervised nonrigid image registration method, trained using artificial displacement vector fields (DVF), for which we propose and compare three network architectures. The artificial DVFs allow training in a fully supervised and voxel-wise dense manner, but without the cost usually associated with the creation of densely labeled data. We propose a scheme to artificially generate DVFs,…
▽ More
We propose a supervised nonrigid image registration method, trained using artificial displacement vector fields (DVF), for which we propose and compare three network architectures. The artificial DVFs allow training in a fully supervised and voxel-wise dense manner, but without the cost usually associated with the creation of densely labeled data. We propose a scheme to artificially generate DVFs, and for chest CT registration augment these with simulated respiratory motion. The proposed architectures are embedded in a multi-stage approach, to increase the capture range of the proposed networks in order to more accurately predict larger displacements. The proposed method, RegNet, is evaluated on multiple databases of chest CT scans and achieved a target registration error of 2.32 $\pm$ 5.33 mm and 1.86 $\pm$ 2.12 mm on SPREAD and DIR-Lab-4DCT studies, respectively. The average inference time of RegNet with two stages is about 2.2 s.
△ Less
Submitted 27 August, 2019;
originally announced August 2019.
-
Fast Dynamic Perfusion and Angiography Reconstruction using an end-to-end 3D Convolutional Neural Network
Authors:
Sahar Yousefi,
Lydiane Hirschler,
Merlijn van der Plas,
Mohamed S. Elmahdy,
Hessam Sokooti,
Matthias Van Osch,
Marius Staring
Abstract:
Hadamard time-encoded pseudo-continuous arterial spin labeling (te-pCASL) is a signal-to-noise ratio (SNR)-efficient MRI technique for acquiring dynamic pCASL signals that encodes the temporal information into the labeling according to a Hadamard matrix. In the decoding step, the contribution of each sub-bolus can be isolated resulting in dynamic perfusion scans. When acquiring te-ASL both with an…
▽ More
Hadamard time-encoded pseudo-continuous arterial spin labeling (te-pCASL) is a signal-to-noise ratio (SNR)-efficient MRI technique for acquiring dynamic pCASL signals that encodes the temporal information into the labeling according to a Hadamard matrix. In the decoding step, the contribution of each sub-bolus can be isolated resulting in dynamic perfusion scans. When acquiring te-ASL both with and without flow-crushing, the ASL-signal in the arteries can be isolated resulting in 4D-angiographic information. However, obtaining multi-timepoint perfusion and angiographic data requires two acquisitions. In this study, we propose a 3D Dense-Unet convolutional neural network with a multi-level loss function for reconstructing multi-timepoint perfusion and angiographic information from an interleaved $50\%$-sampled crushed and $50\%$-sampled non-crushed data, thereby negating the additional scan time. We present a framework to generate dynamic pCASL training and validation data, based on models of the intravascular and extravascular te-pCASL signals. The proposed network achieved SSIM values of $97.3 \pm 1.1$ and $96.2 \pm 11.1$ respectively for 4D perfusion and angiographic data reconstruction for 313 test data-sets.
△ Less
Submitted 4 September, 2019; v1 submitted 24 August, 2019;
originally announced August 2019.
-
Optimal Bit Allocation Variable-Resolution ADC for Massive MIMO
Authors:
I. Zakir Ahmed,
Hamid Sadjadpour,
Shahram Yousefi
Abstract:
In this paper, we derive an optimal ADC bit-allocation (BA) condition for a Single-User (SU) Millimeter wave (mmWave) Massive Multiple-Input Multiple-Output (Ma-MIMO) receiver equipped with variable-resolution ADCs under power constraint with the following criteria: (i) Minimizing the Mean Squared Error (MSE) of the received, quantized and combined symbol vector and (ii) Maximizing the capacity of…
▽ More
In this paper, we derive an optimal ADC bit-allocation (BA) condition for a Single-User (SU) Millimeter wave (mmWave) Massive Multiple-Input Multiple-Output (Ma-MIMO) receiver equipped with variable-resolution ADCs under power constraint with the following criteria: (i) Minimizing the Mean Squared Error (MSE) of the received, quantized and combined symbol vector and (ii) Maximizing the capacity of the SU mmWave Ma-MIMO channel encompassing hybrid precoder and combiner. Optimal BA under both criteria results the same. We jointly design the hybrid combiner based on the SVD of the channel. We demonstrate improvement of the proposed optimal BA over the BA based on Minimization of the Mean Square Quantization Error (MSQE). Using Monte-Carlo simulations, it is shown that the MSE and capacity performance of the proposed BA is very close to that of the Exhaustive Search (ES). The computational complexity of the proposed techniques are compared with ES and MQSE BA algorithms.
△ Less
Submitted 9 February, 2019;
originally announced February 2019.
-
A Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation
Authors:
Sahar Yousefi,
M. T. Manzuri Shalmani,
Antoni B. Chan
Abstract:
Generative dynamic texture models (GDTMs) are widely used for dynamic texture (DT) segmentation in the video sequences. GDTMs represent DTs as a set of linear dynamical systems (LDSs). A major limitation of these models concerns the automatic selection of a proper number of DTs. Dirichlet process mixture (DPM) models which have appeared recently as the cornerstone of the non-parametric Bayesian st…
▽ More
Generative dynamic texture models (GDTMs) are widely used for dynamic texture (DT) segmentation in the video sequences. GDTMs represent DTs as a set of linear dynamical systems (LDSs). A major limitation of these models concerns the automatic selection of a proper number of DTs. Dirichlet process mixture (DPM) models which have appeared recently as the cornerstone of the non-parametric Bayesian statistics, is an optimistic candidate toward resolving this issue. Under this motivation to resolve the aforementioned drawback, we propose a novel non-parametric fully Bayesian approach for DT segmentation, formulated on the basis of a joint DPM and GDTM construction. This interaction causes the algorithm to overcome the problem of automatic segmentation properly. We derive the Variational Bayesian Expectation-Maximization (VBEM) inference for the proposed model. Moreover, in the E-step of inference, we apply Rauch-Tung-Striebel smoother (RTSS) algorithm on Variational Bayesian LDSs. Ultimately, experiments on different video sequences are performed. Experiment results indicate that the proposed algorithm outperforms the previous methods in efficiency and accuracy noticeably.
△ Less
Submitted 13 January, 2019;
originally announced January 2019.
-
Capacity analysis and bit allocation design for variable-resolution ADCs in Massive MIMO
Authors:
I. Zakir Ahmed,
Hamid Sadjadpour,
Shahram Yousefi
Abstract:
We derive an expression for the capacity of massive multiple-input multiple-output Millimeter wave (mmWave) channel where the receiver is equipped with a variable-resolution Analog to Digital Converter (ADC) and a hybrid combiner. The capacity is shown to be a function of Cramer-Rao Lower Bound (CRLB) for a given bit-allocation matrix and hybrid combiner. The condition for optimal ADC bit-allocati…
▽ More
We derive an expression for the capacity of massive multiple-input multiple-output Millimeter wave (mmWave) channel where the receiver is equipped with a variable-resolution Analog to Digital Converter (ADC) and a hybrid combiner. The capacity is shown to be a function of Cramer-Rao Lower Bound (CRLB) for a given bit-allocation matrix and hybrid combiner. The condition for optimal ADC bit-allocation under a receiver power constraint is derived. This is derived based on the maximization of capacity with respect to bit-allocation matrix for a given channel, hybrid precoder, and hybrid combiner. It is shown that this condition coincides with that obtained using the CRLB minimization proposed by Ahmed et al. Monte-carlo simulations show that the capacity calculated using the proposed condition matches very closely with the capacity obtained using the Exhaustive Search bit allocation.
△ Less
Submitted 8 September, 2018;
originally announced September 2018.
-
Single-User mmWave Massive MIMO: SVD-based ADC Bit Allocation and Combiner Design
Authors:
I. Zakir Ahmed,
Hamid Sadjadpour,
Shahram Yousefi
Abstract:
In this paper, we propose a Singular-Value-Decomposition-based variable-resolution Analog to Digital Converter (ADC) bit allocation design for a single-user Millimeter wave massive Multiple-Input Multiple-Output receiver. We derive the optimality condition for bit allocation under a power constraint. This condition ensures optimal receiver performance in the Mean Squared Error (MSE) sense. We deri…
▽ More
In this paper, we propose a Singular-Value-Decomposition-based variable-resolution Analog to Digital Converter (ADC) bit allocation design for a single-user Millimeter wave massive Multiple-Input Multiple-Output receiver. We derive the optimality condition for bit allocation under a power constraint. This condition ensures optimal receiver performance in the Mean Squared Error (MSE) sense. We derive the MSE expression and show that it approaches the Cramer-Rao Lower Bound (CRLB). The CRLB is seen to be a function of the analog combiner, the digital combiner, and the bit allocation matrix. We attempt to minimize the CRLB with respect to the bit allocation matrix by making suitable assumptions regarding the structure of the combiners. In doing so, the bit allocation design reduces to a set of simple inequalities consisting of ADC bits, channel singular values and covariance of the quantization noise along each RF path. This results in a simple and computationally efficient bit allocation algorithm. Using simulations, we show that the MSE performance of our proposed bit allocation is very close to that of the Full Search (FS) bit allocation. We also show that the computational complexity of our proposed method has an order of magnitude improvement compared to FS and Genetic Algorithm based bit allocation of $\cite{Zakir1}$
△ Less
Submitted 23 April, 2018;
originally announced April 2018.
-
A Joint Combiner and Bit Allocation Design for Massive MIMO Using Genetic Algorithm
Authors:
I. Zakir Ahmed,
Hamid Sadjadpour,
Shahram Yousefi
Abstract:
In this paper, we derive a closed-form expression for the combiner of a multiple-input-multiple-output (MIMO) receiver equipped with a minimum-mean-square-error (MMSE) estimator. We propose using variable-bit-resolution analog-to- digital converters (ADC) across radio frequency (RF) paths. The combiner designed is a function of the quantization errors across each RF path. Using very low bit resolu…
▽ More
In this paper, we derive a closed-form expression for the combiner of a multiple-input-multiple-output (MIMO) receiver equipped with a minimum-mean-square-error (MMSE) estimator. We propose using variable-bit-resolution analog-to- digital converters (ADC) across radio frequency (RF) paths. The combiner designed is a function of the quantization errors across each RF path. Using very low bit resolution ADCs (1-2bits) is a popular approach with massive MIMO receiver architectures to mitigate large power demands. We show that for certain channel conditions, adopting unequal bit resolution ADCs (e.g., between 1 and 4 bits) on different RF chains, along with the proposed combiner, improves the performance of the MIMO receiver in the Mean Squared Error (MSE) sense. The variable-bit-resolution ADCs is still within the power constraint of using equal bit resolution ADCs on all paths (e.g., 2-bits). We propose a genetic algorithm in conjunction with the derived combiner to arrive at an optimal ADC bit allocation framework with significant reduction in computational complexity.
△ Less
Submitted 17 November, 2017;
originally announced November 2017.
-
A Novel Approach for Ellipsoidal Outer-Approximation of the Intersection Region of Ellipses in the Plane
Authors:
Siamak Yousefi,
Xiao-Wen Chang,
Henk Wymeersch,
Benoit Champagne,
Godfried Toussaint
Abstract:
In this paper, a novel technique for tight outer-approximation of the intersection region of a finite number of ellipses in 2-dimensional (2D) space is proposed. First, the vertices of a tight polygon that contains the convex intersection of the ellipses are found in an efficient manner. To do so, the intersection points of the ellipses that fall on the boundary of the intersection region are dete…
▽ More
In this paper, a novel technique for tight outer-approximation of the intersection region of a finite number of ellipses in 2-dimensional (2D) space is proposed. First, the vertices of a tight polygon that contains the convex intersection of the ellipses are found in an efficient manner. To do so, the intersection points of the ellipses that fall on the boundary of the intersection region are determined, and a set of points is generated on the elliptic arcs connecting every two neighbouring intersection points. By finding the tangent lines to the ellipses at the extended set of points, a set of half-planes is obtained, whose intersection forms a polygon. To find the polygon more efficiently, the points are given an order and the intersection of the half-planes corresponding to every two neighbouring points is calculated. If the polygon is convex and bounded, these calculated points together with the initially obtained intersection points will form its vertices. If the polygon is non-convex or unbounded, we can detect this situation and then generate additional discrete points only on the elliptical arc segment causing the issue, and restart the algorithm to obtain a bounded and convex polygon. Finally, the smallest area ellipse that contains the vertices of the polygon is obtained by solving a convex optimization problem. Through numerical experiments, it is illustrated that the proposed technique returns a tighter outer-approximation of the intersection of multiple ellipses, compared to conventional techniques, with only slightly higher computational cost.
△ Less
Submitted 18 September, 2017;
originally announced September 2017.
-
A Survey of Human Activity Recognition Using WiFi CSI
Authors:
Siamak Yousefi,
Hirokazu Narui,
Sankalp Dayal,
Stefano Ermon,
Shahrokh Valaee
Abstract:
In this article, we present a survey of recent advances in passive human behaviour recognition in indoor areas using the channel state information (CSI) of commercial WiFi systems. Movement of human body causes a change in the wireless signal reflections, which results in variations in the CSI. By analyzing the data streams of CSIs for different activities and comparing them against stored models,…
▽ More
In this article, we present a survey of recent advances in passive human behaviour recognition in indoor areas using the channel state information (CSI) of commercial WiFi systems. Movement of human body causes a change in the wireless signal reflections, which results in variations in the CSI. By analyzing the data streams of CSIs for different activities and comparing them against stored models, human behaviour can be recognized. This is done by extracting features from CSI data streams and using machine learning techniques to build models and classifiers. The techniques from the literature that are presented herein have great performances, however, instead of the machine learning techniques employed in these works, we propose to use deep learning techniques such as long-short term memory (LSTM) recurrent neural network (RNN), and show the improved performance. We also discuss about different challenges such as environment change, frame rate selection, and multi-user scenario, and suggest possible directions for future work.
△ Less
Submitted 23 August, 2017;
originally announced August 2017.
-
A Novel Motion Detection Method Resistant to Severe Illumination Changes
Authors:
Sahar Yousefi,
M. T. Manzuri Shalmani,
Jeremy Lin,
Marius Staring
Abstract:
Recently, there has been a considerable attention given to the motion detection problem due to the explosive growth of its applications in video analysis and surveillance systems. While the previous approaches can produce good results, an accurate detection of motion remains a challenging task due to the difficulties raised by illumination variations, occlusion, camouflage, burst physical motion,…
▽ More
Recently, there has been a considerable attention given to the motion detection problem due to the explosive growth of its applications in video analysis and surveillance systems. While the previous approaches can produce good results, an accurate detection of motion remains a challenging task due to the difficulties raised by illumination variations, occlusion, camouflage, burst physical motion, dynamic texture, and environmental changes such as those on climate changes, sunlight changes during a day, etc. In this paper, we propose a novel per-pixel motion descriptor for both motion detection and dynamic texture segmentation which outperforms the current methods in the literature particularly in severe scenarios. The proposed descriptor is based on two complementary three-dimensional-discrete wavelet transform (3D-DWT) and three-dimensional wavelet leader. In this approach, a feature vector is extracted for each pixel by applying a novel three dimensional wavelet-based motion descriptor. Then, the extracted features are clustered by a clustering method such as well-known k-means algorithm or Gaussian Mixture Model (GMM). The experimental results demonstrate the effectiveness of our proposed method compared to the other motion detection approaches from the literature. The application of the proposed method and additional experimental results for the different datasets are available at (http://dspl.ce.sharif.edu/motiondetector.html).
△ Less
Submitted 15 March, 2018; v1 submitted 11 December, 2016;
originally announced December 2016.
-
Learning Genomic Representations to Predict Clinical Outcomes in Cancer
Authors:
Safoora Yousefi,
Congzheng Song,
Nelson Nauata,
Lee Cooper
Abstract:
Genomics are rapidly transforming medical practice and basic biomedical research, providing insights into disease mechanisms and improving therapeutic strategies, particularly in cancer. The ability to predict the future course of a patient's disease from high-dimensional genomic profiling will be essential in realizing the promise of genomic medicine, but presents significant challenges for state…
▽ More
Genomics are rapidly transforming medical practice and basic biomedical research, providing insights into disease mechanisms and improving therapeutic strategies, particularly in cancer. The ability to predict the future course of a patient's disease from high-dimensional genomic profiling will be essential in realizing the promise of genomic medicine, but presents significant challenges for state-of-the-art survival analysis methods. In this abstract we present an investigation in learning genomic representations with neural networks to predict patient survival in cancer. We demonstrate the advantages of this approach over existing survival analysis methods using brain tumor data.
△ Less
Submitted 27 September, 2016;
originally announced September 2016.
-
A Continuous-time Mutually-Exciting Point Process Framework for Prioritizing Events in Social Media
Authors:
Mehrdad Farajtabar,
Safoora Yousefi,
Long Q. Tran,
Le Song,
Hongyuan Zha
Abstract:
The overwhelming amount and rate of information update in online social media is making it increasingly difficult for users to allocate their attention to their topics of interest, thus there is a strong need for prioritizing news feeds. The attractiveness of a post to a user depends on many complex contextual and temporal features of the post. For instance, the contents of the post, the responsiv…
▽ More
The overwhelming amount and rate of information update in online social media is making it increasingly difficult for users to allocate their attention to their topics of interest, thus there is a strong need for prioritizing news feeds. The attractiveness of a post to a user depends on many complex contextual and temporal features of the post. For instance, the contents of the post, the responsiveness of a third user, and the age of the post may all have impact. So far, these static and dynamic features has not been incorporated in a unified framework to tackle the post prioritization problem. In this paper, we propose a novel approach for prioritizing posts based on a feature modulated multi-dimensional point process. Our model is able to simultaneously capture textual and sentiment features, and temporal features such as self-excitation, mutual-excitation and bursty nature of social interaction. As an evaluation, we also curated a real-world conversational benchmark dataset crawled from Facebook. In our experiments, we demonstrate that our algorithm is able to achieve the-state-of-the-art performance in terms of analyzing, predicting, and prioritizing events. In terms of interpretability of our method, we observe that features indicating individual user profile and linguistic characteristics of the events work best for prediction and prioritization of new events.
△ Less
Submitted 12 November, 2015;
originally announced November 2015.
-
Mobile Localization in Non-Line-of-Sight Using Constrained Square-Root Unscented Kalman Filter
Authors:
Siamak Yousefi,
Xiao-Wen Chang,
Benoit Champagne
Abstract:
Localization and tracking of a mobile node (MN) in non-line-of-sight (NLOS) scenarios, based on time of arrival (TOA) measurements, is considered in this work. To this end, we develop a constrained form of square root unscented Kalman filter (SRUKF), where the sigma points of the unscented transformation are projected onto the feasible region by solving constrained optimization problems. The feasi…
▽ More
Localization and tracking of a mobile node (MN) in non-line-of-sight (NLOS) scenarios, based on time of arrival (TOA) measurements, is considered in this work. To this end, we develop a constrained form of square root unscented Kalman filter (SRUKF), where the sigma points of the unscented transformation are projected onto the feasible region by solving constrained optimization problems. The feasible region is the intersection of several discs formed by the NLOS measurements. We show how we can reduce the size of the optimization problem and formulate it as a convex quadratically constrained quadratic program (QCQP), which depends on the Cholesky factor of the \textit{a posteriori} error covariance matrix of SRUKF. As a result of these modifications, the proposed constrained SRUKF (CSRUKF) is more efficient and has better numerical stability compared to the constrained UKF. Through simulations, we also show that the CSRUKF achieves a smaller localization error compared to other techniques and that its performance is robust under different NLOS conditions.
△ Less
Submitted 1 May, 2014;
originally announced May 2014.
-
Distributed Cooperative Localization in Wireless Sensor Networks without NLOS Identification
Authors:
Siamak Yousefi,
Xiao-Wen Chang,
Benoit Champagne
Abstract:
In this paper, a 2-stage robust distributed algorithm is proposed for cooperative sensor network localization using time of arrival (TOA) data without identification of non-line of sight (NLOS) links. In the first stage, to overcome the effect of outliers, a convex relaxation of the Huber loss function is applied so that by using iterative optimization techniques, good estimates of the true sensor…
▽ More
In this paper, a 2-stage robust distributed algorithm is proposed for cooperative sensor network localization using time of arrival (TOA) data without identification of non-line of sight (NLOS) links. In the first stage, to overcome the effect of outliers, a convex relaxation of the Huber loss function is applied so that by using iterative optimization techniques, good estimates of the true sensor locations can be obtained. In the second stage, the original (non-relaxed) Huber cost function is further optimized to obtain refined location estimates based on those obtained in the first stage. In both stages, a simple gradient descent technique is used to carry out the optimization. Through simulations and real data analysis, it is shown that the proposed convex relaxation generally achieves a lower root mean squared error (RMSE) compared to other convex relaxation techniques in the literature. Also by doing the second stage, the position estimates are improved and we can achieve an RMSE close to that of the other distributed algorithms which know \textit{a priori} which links are in NLOS.
△ Less
Submitted 3 March, 2014;
originally announced March 2014.
-
Exact MIMO Zero-Forcing Detection Analysis for Transmit-Correlated Rician Fading
Authors:
Constantin Siriteanu,
Steven Blostein,
Akimichi Takemura,
Hyundong Shin,
Shahram Yousefi,
Satoshi Kuriki
Abstract:
We analyze the performance of multiple input/multiple output (MIMO) communications systems employing spatial multiplexing and zero-forcing detection (ZF). The distribution of the ZF signal-to-noise ratio (SNR) is characterized when either the intended stream or interfering streams experience Rician fading, and when the fading may be correlated on the transmit side. Previously, exact ZF analysis ba…
▽ More
We analyze the performance of multiple input/multiple output (MIMO) communications systems employing spatial multiplexing and zero-forcing detection (ZF). The distribution of the ZF signal-to-noise ratio (SNR) is characterized when either the intended stream or interfering streams experience Rician fading, and when the fading may be correlated on the transmit side. Previously, exact ZF analysis based on a well-known SNR expression has been hindered by the noncentrality of the Wishart distribution involved. In addition, approximation with a central-Wishart distribution has not proved consistently accurate. In contrast, the following exact ZF study proceeds from a lesser-known SNR expression that separates the intended and interfering channel-gain vectors. By first conditioning on, and then averaging over the interference, the ZF SNR distribution for Rician-Rayleigh fading is shown to be an infinite linear combination of gamma distributions. On the other hand, for Rayleigh-Rician fading, the ZF SNR is shown to be gamma-distributed. Based on the SNR distribution, we derive new series expressions for the ZF average error probability, outage probability, and ergodic capacity. Numerical results confirm the accuracy of our new expressions, and reveal effects of interference and channel statistics on performance.
△ Less
Submitted 2 January, 2014; v1 submitted 10 July, 2013;
originally announced July 2013.
-
A Joint Localization and Clock Bias Estimation Technique Using Time-of-Arrival at Multiple Antenna Receivers
Authors:
Siamak Yousefi,
Xiao-Wen Chang,
Benoit Champagne
Abstract:
In this work, a system scheme is proposed for tracking a radio emitting target moving in two-dimensional space. The localization is based on the use of biased time-of-arrival (TOA) measurements obtained at two asynchronous receivers, each equipped with two closely spaced antennas. By exploiting the multi-antenna configuration and using all the TOA measurements up to current time step, the relative…
▽ More
In this work, a system scheme is proposed for tracking a radio emitting target moving in two-dimensional space. The localization is based on the use of biased time-of-arrival (TOA) measurements obtained at two asynchronous receivers, each equipped with two closely spaced antennas. By exploiting the multi-antenna configuration and using all the TOA measurements up to current time step, the relative clock bias at each receiver and the target position are jointly estimated by solving a nonlinear least-squares (NLS) problem. To this end, a novel time recursive algorithm is proposed which fully takes advantage of the problem structure to achieve computational efficiency while using orthogonal transformations to ensure numerical reliability. Simulations show that the mean-squared error (MSE) of the proposed method is much smaller than that of existing methods with the same antenna scheme, and approaches the Cramer-Rao lower bound (CRLB) closely.
△ Less
Submitted 12 May, 2014; v1 submitted 3 March, 2013;
originally announced March 2013.
-
Gender Recognition Based on Sift Features
Authors:
Sahar Yousefi,
Morteza Zahedi
Abstract:
This paper proposes a robust approach for face detection and gender classification in color images. Previous researches about gender recognition suppose an expensive computational and time-consuming pre-processing step in order to alignment in which face images are aligned so that facial landmarks like eyes, nose, lips, chin are placed in uniform locations in image. In this paper, a novel techniqu…
▽ More
This paper proposes a robust approach for face detection and gender classification in color images. Previous researches about gender recognition suppose an expensive computational and time-consuming pre-processing step in order to alignment in which face images are aligned so that facial landmarks like eyes, nose, lips, chin are placed in uniform locations in image. In this paper, a novel technique based on mathematical analysis is represented in three stages that eliminates alignment step. First, a new color based face detection method is represented with a better result and more robustness in complex backgrounds. Next, the features which are invariant to affine transformations are extracted from each face using scale invariant feature transform (SIFT) method. To evaluate the performance of the proposed algorithm, experiments have been conducted by employing a SVM classifier on a database of face images which contains 500 images from distinct people with equal ratio of male and female.
△ Less
Submitted 6 August, 2011;
originally announced August 2011.