Modular Neural Image Signal Processing

Authors: Mahmoud Afifi, Zhongling Wang, Ran Zhang, Michael S. Brown

Abstract: This paper presents a modular neural image signal processing (ISP) framework that processes raw inputs and renders high-quality display-referred images. Unlike prior neural ISP designs, our method introduces a high degree of modularity, providing full control over multiple intermediate stages of the rendering process.~This modular design not only achieves high rendering accuracy but also improves… ▽ More This paper presents a modular neural image signal processing (ISP) framework that processes raw inputs and renders high-quality display-referred images. Unlike prior neural ISP designs, our method introduces a high degree of modularity, providing full control over multiple intermediate stages of the rendering process.~This modular design not only achieves high rendering accuracy but also improves scalability, debuggability, generalization to unseen cameras, and flexibility to match different user-preference styles. To demonstrate the advantages of this design, we built a user-interactive photo-editing tool that leverages our neural ISP to support diverse editing operations and picture styles. The tool is carefully engineered to take advantage of the high-quality rendering of our neural ISP and to enable unlimited post-editable re-rendering. Our method is a fully learning-based framework with variants of different capacities, all of moderate size (ranging from ~0.5 M to ~3.9 M parameters for the entire pipeline), and consistently delivers competitive qualitative and quantitative results across multiple test sets. Watch the supplemental video at: https://youtu.be/ByhQjQSjxVM △ Less

Submitted 9 December, 2025; originally announced December 2025.

arXiv:2512.00912 [pdf, ps, other]

ForamDeepSlice: A High-Accuracy Deep Learning Framework for Foraminifera Species Classification from 2D Micro-CT Slices

Authors: Abdelghafour Halimi, Ali Alibrahim, Didier Barradas-Bautista, Ronell Sicat, Abdulkader M. Afifi

Abstract: This study presents a comprehensive deep learning pipeline for the automated classification of 12 foraminifera species using 2D micro-CT slices derived from 3D scans. We curated a scientifically rigorous dataset comprising 97 micro-CT scanned specimens across 27 species, selecting 12 species with sufficient representation for robust machine learning. To ensure methodological integrity and prevent… ▽ More This study presents a comprehensive deep learning pipeline for the automated classification of 12 foraminifera species using 2D micro-CT slices derived from 3D scans. We curated a scientifically rigorous dataset comprising 97 micro-CT scanned specimens across 27 species, selecting 12 species with sufficient representation for robust machine learning. To ensure methodological integrity and prevent data leakage, we employed specimen-level data splitting, resulting in 109,617 high-quality 2D slices (44,103 for training, 14,046 for validation, and 51,468 for testing). We evaluated seven state-of-the-art 2D convolutional neural network (CNN) architectures using transfer learning. Our final ensemble model, combining ConvNeXt-Large and EfficientNetV2-Small, achieved a test accuracy of 95.64%, with a top-3 accuracy of 99.6% and an area under the ROC curve (AUC) of 0.998 across all species. To facilitate practical deployment, we developed an interactive advanced dashboard that supports real-time slice classification and 3D slice matching using advanced similarity metrics, including SSIM, NCC, and the Dice coefficient. This work establishes new benchmarks for AI-assisted micropaleontological identification and provides a fully reproducible framework for foraminifera classification research, bridging the gap between deep learning and applied geosciences. △ Less

Submitted 30 November, 2025; originally announced December 2025.

ACM Class: I.2.10; I.4.6; J.2

arXiv:2509.19624 [pdf, ps, other]

Raw-JPEG Adapter: Efficient Raw Image Compression with JPEG

Authors: Mahmoud Afifi, Ran Zhang, Michael S. Brown

Abstract: Digital cameras digitize scene light into linear raw representations, which the image signal processor (ISP) converts into display-ready outputs. While raw data preserves full sensor information--valuable for editing and vision tasks--formats such as Digital Negative (DNG) require large storage, making them impractical in constrained scenarios. In contrast, JPEG is a widely supported format, offer… ▽ More Digital cameras digitize scene light into linear raw representations, which the image signal processor (ISP) converts into display-ready outputs. While raw data preserves full sensor information--valuable for editing and vision tasks--formats such as Digital Negative (DNG) require large storage, making them impractical in constrained scenarios. In contrast, JPEG is a widely supported format, offering high compression efficiency and broad compatibility, but it is not well-suited for raw storage. This paper presents RawJPEG Adapter, a lightweight, learnable, and invertible preprocessing pipeline that adapts raw images for standard JPEG compression. Our method applies spatial and optional frequency-domain transforms, with compact parameters stored in the JPEG comment field, enabling accurate raw reconstruction. Experiments across multiple datasets show that our method achieves higher fidelity than direct JPEG storage, supports other codecs, and provides a favorable trade-off between compression ratio and reconstruction accuracy. △ Less

Submitted 29 September, 2025; v1 submitted 23 September, 2025; originally announced September 2025.

arXiv:2507.01342 [pdf, ps, other]

Learning Camera-Agnostic White-Balance Preferences

Authors: Luxi Zhao, Mahmoud Afifi, Michael S. Brown

Abstract: The image signal processor (ISP) pipeline in modern cameras consists of several modules that transform raw sensor data into visually pleasing images in a display color space. Among these, the auto white balance (AWB) module is essential for compensating for scene illumination. However, commercial AWB systems often strive to compute aesthetic white-balance preferences rather than accurate neutral c… ▽ More The image signal processor (ISP) pipeline in modern cameras consists of several modules that transform raw sensor data into visually pleasing images in a display color space. Among these, the auto white balance (AWB) module is essential for compensating for scene illumination. However, commercial AWB systems often strive to compute aesthetic white-balance preferences rather than accurate neutral color correction. While learning-based methods have improved AWB accuracy, they typically struggle to generalize across different camera sensors -- an issue for smartphones with multiple cameras. Recent work has explored cross-camera AWB, but most methods remain focused on achieving neutral white balance. In contrast, this paper is the first to address aesthetic consistency by learning a post-illuminant-estimation mapping that transforms neutral illuminant corrections into aesthetically preferred corrections in a camera-agnostic space. Once trained, our mapping can be applied after any neutral AWB module to enable consistent and stylized color rendering across unseen cameras. Our proposed model is lightweight -- containing only $\sim$500 parameters -- and runs in just 0.024 milliseconds on a typical flagship mobile CPU. Evaluated on a dataset of 771 smartphone images from three different cameras, our method achieves state-of-the-art performance while remaining fully compatible with existing cross-camera AWB techniques, introducing minimal computational and memory overhead. △ Less

Submitted 14 August, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

arXiv:2504.07959 [pdf, ps, other]

CCMNet: Leveraging Calibrated Color Correction Matrices for Cross-Camera Color Constancy

Authors: Dongyoung Kim, Mahmoud Afifi, Dongyun Kim, Michael S. Brown, Seon Joo Kim

Abstract: Computational color constancy, or white balancing, is a key module in a camera's image signal processor (ISP) that corrects color casts from scene lighting. Because this operation occurs in the camera-specific raw color space, white balance algorithms must adapt to different cameras. This paper introduces a learning-based method for cross-camera color constancy that generalizes to new cameras with… ▽ More Computational color constancy, or white balancing, is a key module in a camera's image signal processor (ISP) that corrects color casts from scene lighting. Because this operation occurs in the camera-specific raw color space, white balance algorithms must adapt to different cameras. This paper introduces a learning-based method for cross-camera color constancy that generalizes to new cameras without retraining. Our method leverages pre-calibrated color correction matrices (CCMs) available on ISPs that map the camera's raw color space to a standard space (e.g., CIE XYZ). Our method uses these CCMs to transform predefined illumination colors (i.e., along the Planckian locus) into the test camera's raw space. The mapped illuminants are encoded into a compact camera fingerprint embedding (CFE) that enables the network to adapt to unseen cameras. To prevent overfitting due to limited cameras and CCMs during training, we introduce a data augmentation technique that interpolates between cameras and their CCMs. Experimental results across multiple datasets and backbones show that our method achieves state-of-the-art cross-camera color constancy while remaining lightweight and relying only on data readily available in camera ISPs. △ Less

Submitted 15 December, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

arXiv:2504.05623 [pdf, ps, other]

Time-Aware Auto White Balance in Mobile Photography

Authors: Mahmoud Afifi, Luxi Zhao, Abhijith Punnappurath, Mohammed A. Abdelsalam, Ran Zhang, Michael S. Brown

Abstract: Cameras rely on auto white balance (AWB) to correct undesirable color casts caused by scene illumination and the camera's spectral sensitivity. This is typically achieved using an illuminant estimator that determines the global color cast solely from the color information in the camera's raw sensor image. Mobile devices provide valuable additional metadata-such as capture timestamp and geolocation… ▽ More Cameras rely on auto white balance (AWB) to correct undesirable color casts caused by scene illumination and the camera's spectral sensitivity. This is typically achieved using an illuminant estimator that determines the global color cast solely from the color information in the camera's raw sensor image. Mobile devices provide valuable additional metadata-such as capture timestamp and geolocation-that offers strong contextual clues to help narrow down the possible illumination solutions. This paper proposes a lightweight illuminant estimation method that incorporates such contextual metadata, along with additional capture information and image colors, into a compact model (~5K parameters), achieving promising results, matching or surpassing larger models. To validate our method, we introduce a dataset of 3,224 smartphone images with contextual metadata collected at various times of day and under diverse lighting conditions. The dataset includes ground-truth illuminant colors, determined using a color chart, and user-preferred illuminants validated through a user study, providing a comprehensive benchmark for AWB evaluation. △ Less

Submitted 25 June, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

arXiv:2503.22026 [pdf, ps, other]

Multispectral Demosaicing via Dual Cameras

Authors: SaiKiran Tedla, Junyong Lee, Beixuan Yang, Mahmoud Afifi, Michael S. Brown

Abstract: Multispectral (MS) images capture detailed scene information across a wide range of spectral bands, making them invaluable for applications requiring rich spectral data. Integrating MS imaging into multi camera devices, such as smartphones, has the potential to enhance both spectral applications and RGB image quality. A critical step in processing MS data is demosaicing, which reconstructs color i… ▽ More Multispectral (MS) images capture detailed scene information across a wide range of spectral bands, making them invaluable for applications requiring rich spectral data. Integrating MS imaging into multi camera devices, such as smartphones, has the potential to enhance both spectral applications and RGB image quality. A critical step in processing MS data is demosaicing, which reconstructs color information from the mosaic MS images captured by the camera. This paper proposes a method for MS image demosaicing specifically designed for dual-camera setups where both RGB and MS cameras capture the same scene. Our approach leverages co-captured RGB images, which typically have higher spatial fidelity, to guide the demosaicing of lower-fidelity MS images. We introduce the Dual-camera RGB-MS Dataset - a large collection of paired RGB and MS mosaiced images with ground-truth demosaiced outputs - that enables training and evaluation of our method. Experimental results demonstrate that our method achieves state-of-the-art accuracy compared to existing techniques. △ Less

Submitted 25 July, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

Comments: https://ms-demosaic.github.io/

arXiv:2503.11781 [pdf, other]

Color Matching Using Hypernetwork-Based Kolmogorov-Arnold Networks

Authors: Artem Nikonorov, Georgy Perevozchikov, Andrei Korepanov, Nancy Mehta, Mahmoud Afifi, Egor Ershov, Radu Timofte

Abstract: We present cmKAN, a versatile framework for color matching. Given an input image with colors from a source color distribution, our method effectively and accurately maps these colors to match a target color distribution in both supervised and unsupervised settings. Our framework leverages the spline capabilities of Kolmogorov-Arnold Networks (KANs) to model the color matching between source and ta… ▽ More We present cmKAN, a versatile framework for color matching. Given an input image with colors from a source color distribution, our method effectively and accurately maps these colors to match a target color distribution in both supervised and unsupervised settings. Our framework leverages the spline capabilities of Kolmogorov-Arnold Networks (KANs) to model the color matching between source and target distributions. Specifically, we developed a hypernetwork that generates spatially varying weight maps to control the nonlinear splines of a KAN, enabling accurate color matching. As part of this work, we introduce a first large-scale dataset of paired images captured by two distinct cameras and evaluate the efficacy of our and existing methods in matching colors. We evaluated our approach across various color-matching tasks, including: (1) raw-to-raw mapping, where the source color distribution is in one camera's raw color space and the target in another camera's raw space; (2) raw-to-sRGB mapping, where the source color distribution is in a camera's raw space and the target is in the display sRGB space, emulating the color rendering of a camera ISP; and (3) sRGB-to-sRGB mapping, where the goal is to transfer colors from a source sRGB space (e.g., produced by a source camera ISP) to a target sRGB space (e.g., from a different camera ISP). The results show that our method outperforms existing approaches by 37.3% on average for supervised and unsupervised cases while remaining lightweight compared to other methods. The codes, dataset, and pre-trained models are available at: https://github.com/gosha20777/cmKAN △ Less

Submitted 14 March, 2025; originally announced March 2025.

arXiv:2412.12403 [pdf, other]

Characterizing the Networks Sending Enterprise Phishing Emails

Authors: Elisa Luo, Liane Young, Grant Ho, M. H. Afifi, Marco Schweighauser, Ethan Katz-Bassett, Asaf Cidon

Abstract: Phishing attacks on enterprise employees present one of the most costly and potent threats to organizations. We explore an understudied facet of enterprise phishing attacks: the email relay infrastructure behind successfully delivered phishing emails. We draw on a dataset spanning one year across thousands of enterprises, billions of emails, and over 800,000 delivered phishing attacks. Our work sh… ▽ More Phishing attacks on enterprise employees present one of the most costly and potent threats to organizations. We explore an understudied facet of enterprise phishing attacks: the email relay infrastructure behind successfully delivered phishing emails. We draw on a dataset spanning one year across thousands of enterprises, billions of emails, and over 800,000 delivered phishing attacks. Our work sheds light on the network origins of phishing emails received by real-world enterprises, differences in email traffic we observe from networks sending phishing emails, and how these characteristics change over time. Surprisingly, we find that over one-third of the phishing email in our dataset originates from highly reputable networks, including Amazon and Microsoft. Their total volume of phishing email is consistently high across multiple months in our dataset, even though the overwhelming majority of email sent by these networks is benign. In contrast, we observe that a large portion of phishing emails originate from networks where the vast majority of emails they send are phishing, but their email traffic is not consistent over time. Taken together, our results explain why no singular defense strategy, such as static blocklists (which are commonly used in email security filters deployed by organizations in our dataset), is effective at blocking enterprise phishing. Based on our offline analysis, we partnered with a large email security company to deploy a classifier that uses dynamically updated network-based features. In a production environment over a period of 4.5 months, our new detector was able to identify 3-5% more enterprise email attacks that were previously undetected by the company's existing classifiers. △ Less

Submitted 16 December, 2024; originally announced December 2024.

Comments: To appear in the proceedings of the Passive and Active Network Measurement (PAM 2025)

arXiv:2410.16151 [pdf, other]

Small Contributions, Small Networks: Efficient Neural Network Pruning Based on Relative Importance

Authors: Mostafa Hussien, Mahmoud Afifi, Kim Khoa Nguyen, Mohamed Cheriet

Abstract: Recent advancements have scaled neural networks to unprecedented sizes, achieving remarkable performance across a wide range of tasks. However, deploying these large-scale models on resource-constrained devices poses significant challenges due to substantial storage and computational requirements. Neural network pruning has emerged as an effective technique to mitigate these limitations by reducin… ▽ More Recent advancements have scaled neural networks to unprecedented sizes, achieving remarkable performance across a wide range of tasks. However, deploying these large-scale models on resource-constrained devices poses significant challenges due to substantial storage and computational requirements. Neural network pruning has emerged as an effective technique to mitigate these limitations by reducing model size and complexity. In this paper, we introduce an intuitive and interpretable pruning method based on activation statistics, rooted in information theory and statistical analysis. Our approach leverages the statistical properties of neuron activations to identify and remove weights with minimal contributions to neuron outputs. Specifically, we build a distribution of weight contributions across the dataset and utilize its parameters to guide the pruning process. Furthermore, we propose a Pruning-aware Training strategy that incorporates an additional regularization term to enhance the effectiveness of our pruning method. Extensive experiments on multiple datasets and network architectures demonstrate that our method consistently outperforms several baseline and state-of-the-art pruning techniques. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2405.15668 [pdf, ps, other]

What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models

Authors: Abdelrahman Abdelhamed, Mahmoud Afifi, Alec Go

Abstract: Large language models (LLMs) have been effectively used for many computer vision tasks, including image classification. In this paper, we present a simple yet effective approach for zero-shot image classification using multimodal LLMs. Using multimodal LLMs, we generate comprehensive textual representations from input images. These textual representations are then utilized to generate fixed-dimens… ▽ More Large language models (LLMs) have been effectively used for many computer vision tasks, including image classification. In this paper, we present a simple yet effective approach for zero-shot image classification using multimodal LLMs. Using multimodal LLMs, we generate comprehensive textual representations from input images. These textual representations are then utilized to generate fixed-dimensional features in a cross-modal embedding space. Subsequently, these features are fused together to perform zero-shot classification using a linear classifier. Our method does not require prompt engineering for each dataset; instead, we use a single, straightforward set of prompts across all datasets. We evaluated our method on several datasets and our results demonstrate its remarkable effectiveness, surpassing benchmark accuracy on multiple datasets. On average, for ten benchmarks, our method achieved an accuracy gain of 6.2 percentage points, with an increase of 6.8 percentage points on the ImageNet dataset, compared to prior methods re-evaluated with the same setup. Our findings highlight the potential of multimodal LLMs to enhance computer vision tasks such as zero-shot image classification, offering a significant improvement over traditional methods. △ Less

Submitted 25 June, 2025; v1 submitted 24 May, 2024; originally announced May 2024.

arXiv:2404.10700 [pdf, other]

Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs

Authors: Georgy Perevozchikov, Nancy Mehta, Mahmoud Afifi, Radu Timofte

Abstract: Modern smartphone camera quality heavily relies on the image signal processor (ISP) to enhance captured raw images, utilizing carefully designed modules to produce final output images encoded in a standard color space (e.g., sRGB). Neural-based end-to-end learnable ISPs offer promising advancements, potentially replacing traditional ISPs with their ability to adapt without requiring extensive tuni… ▽ More Modern smartphone camera quality heavily relies on the image signal processor (ISP) to enhance captured raw images, utilizing carefully designed modules to produce final output images encoded in a standard color space (e.g., sRGB). Neural-based end-to-end learnable ISPs offer promising advancements, potentially replacing traditional ISPs with their ability to adapt without requiring extensive tuning for each new camera model, as is often the case for nearly every module in traditional ISPs. However, the key challenge with the recent learning-based ISPs is the urge to collect large paired datasets for each distinct camera model due to the influence of intrinsic camera characteristics on the formation of input raw images. This paper tackles this challenge by introducing a novel method for unpaired learning of raw-to-raw translation across diverse cameras. Specifically, we propose Rawformer, an unsupervised Transformer-based encoder-decoder method for raw-to-raw translation. It accurately maps raw images captured by a certain camera to the target camera, facilitating the generalization of learnable ISPs to new unseen cameras. Our method demonstrates superior performance on real camera datasets, achieving higher accuracy compared to previous state-of-the-art techniques, and preserving a more robust correlation between the original and translated raw images. The codes and the pretrained models are available at https://github.com/gosha20777/rawformer. △ Less

Submitted 15 July, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: Accepted by ECCV 2024

Journal ref: https://eccv.ecva.net/Conferences/2024

arXiv:2403.03111 [pdf, other]

Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection

Authors: Mohamed Afifi, Mohamed ElHelw

Abstract: Perception is a key element for enabling intelligent autonomous navigation. Understanding the semantics of the surrounding environment and accurate vehicle pose estimation are essential capabilities for autonomous vehicles, including self-driving cars and mobile robots that perform complex tasks. Fast moving platforms like self-driving cars impose a hard challenge for localization and mapping algo… ▽ More Perception is a key element for enabling intelligent autonomous navigation. Understanding the semantics of the surrounding environment and accurate vehicle pose estimation are essential capabilities for autonomous vehicles, including self-driving cars and mobile robots that perform complex tasks. Fast moving platforms like self-driving cars impose a hard challenge for localization and mapping algorithms. In this work, we propose a novel framework for real-time LiDAR odometry and mapping based on LOAM architecture for fast moving platforms. Our framework utilizes semantic information produced by a deep learning model to improve point-to-line and point-to-plane matching between LiDAR scans and build a semantic map of the environment, leading to more accurate motion estimation using LiDAR data. We observe that including semantic information in the matching process introduces a new type of outlier matches to the process, where matching occur between different objects of the same semantic class. To this end, we propose a novel algorithm that explicitly identifies and discards potential outliers in the matching process. In our experiments, we study the effect of improving the matching process on the robustness of LiDAR odometry against high speed motion. Our experimental evaluations on KITTI dataset demonstrate that utilizing semantic information and rejecting outliers significantly enhance the robustness of LiDAR odometry and mapping when there are large gaps between scan acquisition poses, which is typical for fast moving platforms. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.02449 [pdf, other]

Optimizing Illuminant Estimation in Dual-Exposure HDR Imaging

Authors: Mahmoud Afifi, Zhenhua Hu, Liang Liang

Abstract: High dynamic range (HDR) imaging involves capturing a series of frames of the same scene, each with different exposure settings, to broaden the dynamic range of light. This can be achieved through burst capturing or using staggered HDR sensors that capture long and short exposures simultaneously in the camera image signal processor (ISP). Within camera ISP pipeline, illuminant estimation is a cruc… ▽ More High dynamic range (HDR) imaging involves capturing a series of frames of the same scene, each with different exposure settings, to broaden the dynamic range of light. This can be achieved through burst capturing or using staggered HDR sensors that capture long and short exposures simultaneously in the camera image signal processor (ISP). Within camera ISP pipeline, illuminant estimation is a crucial step aiming to estimate the color of the global illuminant in the scene. This estimation is used in camera ISP white-balance module to remove undesirable color cast in the final image. Despite the multiple frames captured in the HDR pipeline, conventional illuminant estimation methods often rely only on a single frame of the scene. In this paper, we explore leveraging information from frames captured with different exposure times. Specifically, we introduce a simple feature extracted from dual-exposure images to guide illuminant estimators, referred to as the dual-exposure feature (DEF). To validate the efficiency of DEF, we employed two illuminant estimators using the proposed DEF: 1) a multilayer perceptron network (MLP), referred to as exposure-based MLP (EMLP), and 2) a modified version of the convolutional color constancy (CCC) to integrate our DEF, that we call ECCC. Both EMLP and ECCC achieve promising results, in some cases surpassing prior methods that require hundreds of thousands or millions of parameters, with only a few hundred parameters for EMLP and a few thousand parameters for ECCC. △ Less

Submitted 6 April, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2111.07837 [pdf, other]

Multi-View Motion Synthesis via Applying Rotated Dual-Pixel Blur Kernels

Authors: Abdullah Abuolaim, Mahmoud Afifi, Michael S. Brown

Abstract: Portrait mode is widely available on smartphone cameras to provide an enhanced photographic experience. One of the primary effects applied to images captured in portrait mode is a synthetic shallow depth of field (DoF). The synthetic DoF (or bokeh effect) selectively blurs regions in the image to emulate the effect of using a large lens with a wide aperture. In addition, many applications now inco… ▽ More Portrait mode is widely available on smartphone cameras to provide an enhanced photographic experience. One of the primary effects applied to images captured in portrait mode is a synthetic shallow depth of field (DoF). The synthetic DoF (or bokeh effect) selectively blurs regions in the image to emulate the effect of using a large lens with a wide aperture. In addition, many applications now incorporate a new image motion attribute (NIMAT) to emulate background motion, where the motion is correlated with estimated depth at each pixel. In this work, we follow the trend of rendering the NIMAT effect by introducing a modification on the blur synthesis procedure in portrait mode. In particular, our modification enables a high-quality synthesis of multi-view bokeh from a single image by applying rotated blurring kernels. Given the synthesized multiple views, we can generate aesthetically realistic image motion similar to the NIMAT effect. We validate our approach qualitatively compared to the original NIMAT effect and other similar image motions, like Facebook 3D image. Our image motion demonstrates a smooth image view transition with fewer artifacts around the object boundary. △ Less

Submitted 15 November, 2021; originally announced November 2021.

arXiv:2110.08732 [pdf]

A Deep Learning-based Approach for Real-time Facemask Detection

Authors: Wadii Boulila, Ayyub Alzahem, Aseel Almoudi, Muhanad Afifi, Ibrahim Alturki, Maha Driss

Abstract: The COVID-19 pandemic is causing a global health crisis. Public spaces need to be safeguarded from the adverse effects of this pandemic. Wearing a facemask becomes one of the effective protection solutions adopted by many governments. Manual real-time monitoring of facemask wearing for a large group of people is becoming a difficult task. The goal of this paper is to use deep learning (DL), which… ▽ More The COVID-19 pandemic is causing a global health crisis. Public spaces need to be safeguarded from the adverse effects of this pandemic. Wearing a facemask becomes one of the effective protection solutions adopted by many governments. Manual real-time monitoring of facemask wearing for a large group of people is becoming a difficult task. The goal of this paper is to use deep learning (DL), which has shown excellent results in many real-life applications, to ensure efficient real-time facemask detection. The proposed approach is based on two steps. An off-line step aiming to create a DL model that is able to detect and locate facemasks and whether they are appropriately worn. An online step that deploys the DL model at edge computing in order to detect masks in real-time. In this study, we propose to use MobileNetV2 to detect facemask in real-time. Several experiments are conducted and show good performances of the proposed approach (99% for training and testing accuracy). In addition, several comparisons with many state-of-the-art models namely ResNet50, DenseNet, and VGG16 show good performance of the MobileNetV2 in terms of training time and accuracy. △ Less

Submitted 17 October, 2021; originally announced October 2021.

arXiv:2109.08750 [pdf, other]

Auto White-Balance Correction for Mixed-Illuminant Scenes

Authors: Mahmoud Afifi, Marcus A. Brubaker, Michael S. Brown

Abstract: Auto white balance (AWB) is applied by camera hardware at capture time to remove the color cast caused by the scene illumination. The vast majority of white-balance algorithms assume a single light source illuminates the scene; however, real scenes often have mixed lighting conditions. This paper presents an effective AWB method to deal with such mixed-illuminant scenes. A unique departure from co… ▽ More Auto white balance (AWB) is applied by camera hardware at capture time to remove the color cast caused by the scene illumination. The vast majority of white-balance algorithms assume a single light source illuminates the scene; however, real scenes often have mixed lighting conditions. This paper presents an effective AWB method to deal with such mixed-illuminant scenes. A unique departure from conventional AWB, our method does not require illuminant estimation, as is the case in traditional camera AWB modules. Instead, our method proposes to render the captured scene with a small set of predefined white-balance settings. Given this set of rendered images, our method learns to estimate weighting maps that are used to blend the rendered images to generate the final corrected image. Through extensive experiments, we show this proposed method produces promising results compared to other alternatives for single- and mixed-illuminant scene color correction. Our source code and trained models are available at https://github.com/mahmoudnafifi/mixedillWB. △ Less

Submitted 7 October, 2021; v1 submitted 17 September, 2021; originally announced September 2021.

Journal ref: WACV 2021

arXiv:2108.05251 [pdf, other]

Improving Single-Image Defocus Deblurring: How Dual-Pixel Images Help Through Multi-Task Learning

Authors: Abdullah Abuolaim, Mahmoud Afifi, Michael S. Brown

Abstract: Many camera sensors use a dual-pixel (DP) design that operates as a rudimentary light field providing two sub-aperture views of a scene in a single capture. The DP sensor was developed to improve how cameras perform autofocus. Since the DP sensor's introduction, researchers have found additional uses for the DP data, such as depth estimation, reflection removal, and defocus deblurring. We are inte… ▽ More Many camera sensors use a dual-pixel (DP) design that operates as a rudimentary light field providing two sub-aperture views of a scene in a single capture. The DP sensor was developed to improve how cameras perform autofocus. Since the DP sensor's introduction, researchers have found additional uses for the DP data, such as depth estimation, reflection removal, and defocus deblurring. We are interested in the latter task of defocus deblurring. In particular, we propose a single-image deblurring network that incorporates the two sub-aperture views into a multi-task framework. Specifically, we show that jointly learning to predict the two DP views from a single blurry input image improves the network's ability to learn to deblur the image. Our experiments show this multi-task strategy achieves +1dB PSNR improvement over state-of-the-art defocus deblurring methods. In addition, our multi-task framework allows accurate DP-view synthesis (e.g., ~39dB PSNR) from the single input image. These high-quality DP views can be used for other DP-based applications, such as reflection removal. As part of this effort, we have captured a new dataset of 7,059 high-quality images to support our training for the DP-view synthesis task. Our dataset, code, and trained models are publicly available at https://github.com/Abdullah-Abuolaim/multi-task-defocus-deblurring-dual-pixel-nimat. △ Less

Submitted 9 February, 2022; v1 submitted 11 August, 2021; originally announced August 2021.

Comments: Published in the Winter Conference on Applications of Computer Vision 2022 (WACV'22)

arXiv:2107.13117 [pdf, other]

Image color correction, enhancement, and editing

Authors: Mahmoud Afifi

Abstract: This thesis presents methods and approaches to image color correction, color enhancement, and color editing. To begin, we study the color correction problem from the standpoint of the camera's image signal processor (ISP). A camera's ISP is hardware that applies a series of in-camera image processing and color manipulation steps, many of which are nonlinear in nature, to render the initial sensor… ▽ More This thesis presents methods and approaches to image color correction, color enhancement, and color editing. To begin, we study the color correction problem from the standpoint of the camera's image signal processor (ISP). A camera's ISP is hardware that applies a series of in-camera image processing and color manipulation steps, many of which are nonlinear in nature, to render the initial sensor image to its final photo-finished representation saved in the 8-bit standard RGB (sRGB) color space. As white balance (WB) is one of the major procedures applied by the ISP for color correction, this thesis presents two different methods for ISP white balancing. Afterward, we discuss another scenario of correcting and editing image colors, where we present a set of methods to correct and edit WB settings for images that have been improperly white-balanced by the ISP. Then, we explore another factor that has a significant impact on the quality of camera-rendered colors, in which we outline two different methods to correct exposure errors in camera-rendered images. Lastly, we discuss post-capture auto color editing and manipulation. In particular, we propose auto image recoloring methods to generate different realistic versions of the same camera-rendered image with new colors. Through extensive evaluations, we demonstrate that our methods provide superior solutions compared to existing alternatives targeting color correction, color enhancement, and color editing. △ Less

Submitted 27 July, 2021; originally announced July 2021.

Comments: PhD dissertation

arXiv:2106.13920 [pdf, other]

CAMS: Color-Aware Multi-Style Transfer

Authors: Mahmoud Afifi, Abdullah Abuolaim, Mostafa Hussien, Marcus A. Brubaker, Michael S. Brown

Abstract: Image style transfer aims to manipulate the appearance of a source image, or "content" image, to share similar texture and colors of a target "style" image. Ideally, the style transfer manipulation should also preserve the semantic content of the source image. A commonly used approach to assist in transferring styles is based on Gram matrix optimization. One problem of Gram matrix-based optimizati… ▽ More Image style transfer aims to manipulate the appearance of a source image, or "content" image, to share similar texture and colors of a target "style" image. Ideally, the style transfer manipulation should also preserve the semantic content of the source image. A commonly used approach to assist in transferring styles is based on Gram matrix optimization. One problem of Gram matrix-based optimization is that it does not consider the correlation between colors and their styles. Specifically, certain textures or structures should be associated with specific colors. This is particularly challenging when the target style image exhibits multiple style types. In this work, we propose a color-aware multi-style transfer method that generates aesthetically pleasing results while preserving the style-color correlation between style and generated images. We achieve this desired outcome by introducing a simple but efficient modification to classic Gram matrix-based style transfer optimization. A nice feature of our method is that it enables the users to manually select the color associations between the target style and content image for more transfer flexibility. We validated our method with several qualitative comparisons, including a user study conducted with 30 participants. In comparison with prior work, our method is simple, easy to implement, and achieves visually appealing results when targeting images that have multiple styles. Source code is available at https://github.com/mahmoudnafifi/color-aware-style-transfer. △ Less

Submitted 4 September, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

arXiv:2106.13883 [pdf, other]

Semi-Supervised Raw-to-Raw Mapping

Authors: Mahmoud Afifi, Abdullah Abuolaim

Abstract: The raw-RGB colors of a camera sensor vary due to the spectral sensitivity differences across different sensor makes and models. This paper focuses on the task of mapping between different sensor raw-RGB color spaces. Prior work addressed this problem using a pairwise calibration to achieve accurate color mapping. Although being accurate, this approach is less practical as it requires: (1) capturi… ▽ More The raw-RGB colors of a camera sensor vary due to the spectral sensitivity differences across different sensor makes and models. This paper focuses on the task of mapping between different sensor raw-RGB color spaces. Prior work addressed this problem using a pairwise calibration to achieve accurate color mapping. Although being accurate, this approach is less practical as it requires: (1) capturing pair of images by both camera devices with a color calibration object placed in each new scene; (2) accurate image alignment or manual annotation of the color calibration object. This paper aims to tackle color mapping in the raw space through a more practical setup. Specifically, we present a semi-supervised raw-to-raw mapping method trained on a small set of paired images alongside an unpaired set of images captured by each camera device. Through extensive experiments, we show that our method achieves better results compared to other domain adaptation alternatives in addition to the single-calibration solution. We have generated a new dataset of raw images from two different smartphone cameras as part of this effort. Our dataset includes unpaired and paired sets for our semi-supervised training and evaluation. △ Less

Submitted 6 September, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

arXiv:2012.07072 [pdf]

Robust Real-Time Pedestrian Detection on Embedded Devices

Authors: Mohamed Afifi, Yara Ali, Karim Amer, Mahmoud Shaker, Mohamed Elhelw

Abstract: Detection of pedestrians on embedded devices, such as those on-board of robots and drones, has many applications including road intersection monitoring, security, crowd monitoring and surveillance, to name a few. However, the problem can be challenging due to continuously-changing camera viewpoint and varying object appearances as well as the need for lightweight algorithms suitable for embedded s… ▽ More Detection of pedestrians on embedded devices, such as those on-board of robots and drones, has many applications including road intersection monitoring, security, crowd monitoring and surveillance, to name a few. However, the problem can be challenging due to continuously-changing camera viewpoint and varying object appearances as well as the need for lightweight algorithms suitable for embedded systems. This paper proposes a robust framework for pedestrian detection in many footages. The framework performs fine and coarse detections on different image regions and exploits temporal and spatial characteristics to attain enhanced accuracy and real time performance on embedded boards. The framework uses the Yolo-v3 object detection [1] as its backbone detector and runs on the Nvidia Jetson TX2 embedded board, however other detectors and/or boards can be used as well. The performance of the framework is demonstrated on two established datasets and its achievement of the second place in CVPR 2019 Embedded Real-Time Inference (ERTI) Challenge. △ Less

Submitted 13 December, 2020; originally announced December 2020.

arXiv:2011.11890 [pdf, other]

Cross-Camera Convolutional Color Constancy

Authors: Mahmoud Afifi, Jonathan T. Barron, Chloe LeGendre, Yun-Ta Tsai, Francois Bleibel

Abstract: We present "Cross-Camera Convolutional Color Constancy" (C5), a learning-based method, trained on images from multiple cameras, that accurately estimates a scene's illuminant color from raw images captured by a new camera previously unseen during training. C5 is a hypernetwork-like extension of the convolutional color constancy (CCC) approach: C5 learns to generate the weights of a CCC model that… ▽ More We present "Cross-Camera Convolutional Color Constancy" (C5), a learning-based method, trained on images from multiple cameras, that accurately estimates a scene's illuminant color from raw images captured by a new camera previously unseen during training. C5 is a hypernetwork-like extension of the convolutional color constancy (CCC) approach: C5 learns to generate the weights of a CCC model that is then evaluated on the input image, with the CCC weights dynamically adapted to different input content. Unlike prior cross-camera color constancy models, which are usually designed to be agnostic to the spectral properties of test-set images from unobserved cameras, C5 approaches this problem through the lens of transductive inference: additional unlabeled images are provided as input to the model at test time, which allows the model to calibrate itself to the spectral properties of the test-set camera during inference. C5 achieves state-of-the-art accuracy for cross-camera color constancy on several datasets, is fast to evaluate (~7 and ~90 ms per image on a GPU or CPU, respectively), and requires little memory (~2 MB), and thus is a practical solution to the problem of calibration-free automatic white balance for mobile photography. △ Less

Submitted 10 February, 2022; v1 submitted 23 November, 2020; originally announced November 2020.

Journal ref: ICCV 2021

arXiv:2011.11731 [pdf, other]

HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms

Authors: Mahmoud Afifi, Marcus A. Brubaker, Michael S. Brown

Abstract: While generative adversarial networks (GANs) can successfully produce high-quality images, they can be challenging to control. Simplifying GAN-based image generation is critical for their adoption in graphic design and artistic work. This goal has led to significant interest in methods that can intuitively control the appearance of images generated by GANs. In this paper, we present HistoGAN, a co… ▽ More While generative adversarial networks (GANs) can successfully produce high-quality images, they can be challenging to control. Simplifying GAN-based image generation is critical for their adoption in graphic design and artistic work. This goal has led to significant interest in methods that can intuitively control the appearance of images generated by GANs. In this paper, we present HistoGAN, a color histogram-based method for controlling GAN-generated images' colors. We focus on color histograms as they provide an intuitive way to describe image color while remaining decoupled from domain-specific semantics. Specifically, we introduce an effective modification of the recent StyleGAN architecture to control the colors of GAN-generated images specified by a target color histogram feature. We then describe how to expand HistoGAN to recolor real images. For image recoloring, we jointly train an encoder network along with HistoGAN. The recoloring model, ReHistoGAN, is an unsupervised approach trained to encourage the network to keep the original image's content while changing the colors based on the given target histogram. We show that this histogram-based approach offers a better way to control GAN-generated and real images' colors while producing more compelling results compared to existing alternative strategies. △ Less

Submitted 26 March, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

Comments: CVPR 2021

arXiv:2011.04723 [pdf, other]

F-FADE: Frequency Factorization for Anomaly Detection in Edge Streams

Authors: Yen-Yu Chang, Pan Li, Rok Sosic, M. H. Afifi, Marco Schweighauser, Jure Leskovec

Abstract: Edge streams are commonly used to capture interactions in dynamic networks, such as email, social, or computer networks. The problem of detecting anomalies or rare events in edge streams has a wide range of applications. However, it presents many challenges due to lack of labels, a highly dynamic nature of interactions, and the entanglement of temporal and structural changes in the network. Curren… ▽ More Edge streams are commonly used to capture interactions in dynamic networks, such as email, social, or computer networks. The problem of detecting anomalies or rare events in edge streams has a wide range of applications. However, it presents many challenges due to lack of labels, a highly dynamic nature of interactions, and the entanglement of temporal and structural changes in the network. Current methods are limited in their ability to address the above challenges and to efficiently process a large number of interactions. Here, we propose F-FADE, a new approach for detection of anomalies in edge streams, which uses a novel frequency-factorization technique to efficiently model the time-evolving distributions of frequencies of interactions between node-pairs. The anomalies are then determined based on the likelihood of the observed frequency of each incoming interaction. F-FADE is able to handle in an online streaming setting a broad variety of anomalies with temporal and structural changes, while requiring only constant memory. Our experiments on one synthetic and six real-world dynamic networks show that F-FADE achieves state of the art performance and may detect anomalies that previous methods are unable to find. △ Less

Submitted 5 February, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

Comments: WSDM 2021

arXiv:2011.01974 [pdf, other]

Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR Point Clouds

Authors: Yara Ali Alnaggar, Mohamed Afifi, Karim Amer, Mohamed Elhelw

Abstract: Semantic segmentation of 3D point cloud data is essential for enhanced high-level perception in autonomous platforms. Furthermore, given the increasing deployment of LiDAR sensors onboard of cars and drones, a special emphasis is also placed on non-computationally intensive algorithms that operate on mobile GPUs. Previous efficient state-of-the-art methods relied on 2D spherical projection of poin… ▽ More Semantic segmentation of 3D point cloud data is essential for enhanced high-level perception in autonomous platforms. Furthermore, given the increasing deployment of LiDAR sensors onboard of cars and drones, a special emphasis is also placed on non-computationally intensive algorithms that operate on mobile GPUs. Previous efficient state-of-the-art methods relied on 2D spherical projection of point clouds as input for 2D fully convolutional neural networks to balance the accuracy-speed trade-off. This paper introduces a novel approach for 3D point cloud semantic segmentation that exploits multiple projections of the point cloud to mitigate the loss of information inherent in single projection methods. Our Multi-Projection Fusion (MPF) framework analyzes spherical and bird's-eye view projections using two separate highly-efficient 2D fully convolutional models then combines the segmentation results of both views. The proposed framework is validated on the SemanticKITTI dataset where it achieved a mIoU of 55.5 which is higher than state-of-the-art projection-based methods RangeNet++ and PolarNet while being 1.6x faster than the former and 3.1x faster than the latter. △ Less

Submitted 6 November, 2020; v1 submitted 3 November, 2020; originally announced November 2020.

Comments: Accepted at the 2021 Winter Conference on Applications of Computer Vision (WACV 2021)

arXiv:2009.12798 [pdf, other]

AIM 2020: Scene Relighting and Illumination Estimation Challenge

Authors: Majed El Helou, Ruofan Zhou, Sabine Süsstrunk, Radu Timofte, Mahmoud Afifi, Michael S. Brown, Kele Xu, Hengxing Cai, Yuzhong Liu, Li-Wen Wang, Zhi-Song Liu, Chu-Tak Li, Sourya Dipta Das, Nisarg A. Shah, Akashdeep Jassal, Tongtong Zhao, Shanshan Zhao, Sabari Nathan, M. Parisa Beham, R. Suganya, Qing Wang, Zhongyun Hu, Xin Huang, Yaning Li, Maitreya Suin , et al. (12 additional authors not shown)

Abstract: We review the AIM 2020 challenge on virtual image relighting and illumination estimation. This paper presents the novel VIDIT dataset used in the challenge and the different proposed solutions and final evaluation results over the 3 challenge tracks. The first track considered one-to-one relighting; the objective was to relight an input photo of a scene with a different color temperature and illum… ▽ More We review the AIM 2020 challenge on virtual image relighting and illumination estimation. This paper presents the novel VIDIT dataset used in the challenge and the different proposed solutions and final evaluation results over the 3 challenge tracks. The first track considered one-to-one relighting; the objective was to relight an input photo of a scene with a different color temperature and illuminant orientation (i.e., light source position). The goal of the second track was to estimate illumination settings, namely the color temperature and orientation, from a given image. Lastly, the third track dealt with any-to-any relighting, thus a generalization of the first track. The target color temperature and orientation, rather than being pre-determined, are instead given by a guide image. Participants were allowed to make use of their track 1 and 2 solutions for track 3. The tracks had 94, 52, and 56 registered participants, respectively, leading to 20 confirmed submissions in the final competition stage. △ Less

Submitted 27 September, 2020; originally announced September 2020.

Comments: ECCVW 2020. Data and more information on https://github.com/majedelhelou/VIDIT

arXiv:2009.12632 [pdf, other]

Interactive White Balancing for Camera-Rendered Images

Authors: Mahmoud Afifi, Michael S. Brown

Abstract: White balance (WB) is one of the first photo-finishing steps used to render a captured image to its final output. WB is applied to remove the color cast caused by the scene's illumination. Interactive photo-editing software allows users to manually select different regions in a photo as examples of the illumination for WB correction (e.g., clicking on achromatic objects). Such interactive editing… ▽ More White balance (WB) is one of the first photo-finishing steps used to render a captured image to its final output. WB is applied to remove the color cast caused by the scene's illumination. Interactive photo-editing software allows users to manually select different regions in a photo as examples of the illumination for WB correction (e.g., clicking on achromatic objects). Such interactive editing is possible only with images saved in a RAW image format. This is because RAW images have no photo-rendering operations applied and photo-editing software is able to apply WB and other photo-finishing procedures to render the final image. Interactively editing WB in camera-rendered images is significantly more challenging. This is because the camera hardware has already applied WB to the image and subsequent nonlinear photo-processing routines. These nonlinear rendering operations make it difficult to change the WB post-capture. The goal of this paper is to allow interactive WB manipulation of camera-rendered images. The proposed method is an extension of our recent work \cite{afifi2019color} that proposed a post-capture method for WB correction based on nonlinear color-mapping functions. Here, we introduce a new framework that links the nonlinear color-mapping functions directly to user-selected colors to enable {\it interactive} WB manipulation. This new framework is also more efficient in terms of memory and run-time (99\% reduction in memory and 3$\times$ speed-up). Lastly, we describe how our framework can leverage a simple illumination estimation method (i.e., gray-world) to perform auto-WB correction that is on a par with the WB correction results in \cite{afifi2019color}. The source code is publicly available at https://github.com/mahmoudnafifi/Interactive_WB_correction. △ Less

Submitted 26 September, 2020; originally announced September 2020.

Comments: To appear in Color and Imaging Conference (CIC28), 2020

arXiv:2007.14030 [pdf, other]

A Large-Scale Analysis of Attacker Activity in Compromised Enterprise Accounts

Authors: Neil Shah, Grant Ho, Marco Schweighauser, M. H. Afifi, Asaf Cidon, David Wagner

Abstract: We present a large-scale characterization of attacker activity across 111 real-world enterprise organizations. We develop a novel forensic technique for distinguishing between attacker activity and benign activity in compromised enterprise accounts that yields few false positives and enables us to perform fine-grained analysis of attacker behavior. Applying our methods to a set of 159 compromised… ▽ More We present a large-scale characterization of attacker activity across 111 real-world enterprise organizations. We develop a novel forensic technique for distinguishing between attacker activity and benign activity in compromised enterprise accounts that yields few false positives and enables us to perform fine-grained analysis of attacker behavior. Applying our methods to a set of 159 compromised enterprise accounts, we quantify the duration of time attackers are active in accounts and examine thematic patterns in how attackers access and leverage these hijacked accounts. We find that attackers frequently dwell in accounts for multiple days to weeks, suggesting that delayed (non-real-time) detection can still provide significant value. Based on an analysis of the attackers' timing patterns, we observe two distinct modalities in how attackers access compromised accounts, which could be explained by the existence of a specialized market for hijacked enterprise accounts: where one class of attackers focuses on compromising and selling account access to another class of attackers who exploit the access such hijacked accounts provide. Ultimately, our analysis sheds light on the state of enterprise account hijacking and highlights fruitful directions for a broader space of detection methods, ranging from new features that home in on malicious account behavior to the development of non-real-time detection methods that leverage malicious activity after an attack's initial point of compromise to more accurately identify attacks. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Comments: Extended report of workshop paper presented at the 1st MLHat Workshop (MLHat Security and ML 2020). KDD, 2020

arXiv:2006.12709 [pdf, other]

CIE XYZ Net: Unprocessing Images for Low-Level Computer Vision Tasks

Authors: Mahmoud Afifi, Abdelrahman Abdelhamed, Abdullah Abuolaim, Abhijith Punnappurath, Michael S. Brown

Abstract: Cameras currently allow access to two image states: (i) a minimally processed linear raw-RGB image state (i.e., raw sensor data) or (ii) a highly-processed nonlinear image state (e.g., sRGB). There are many computer vision tasks that work best with a linear image state, such as image deblurring and image dehazing. Unfortunately, the vast majority of images are saved in the nonlinear image state. B… ▽ More Cameras currently allow access to two image states: (i) a minimally processed linear raw-RGB image state (i.e., raw sensor data) or (ii) a highly-processed nonlinear image state (e.g., sRGB). There are many computer vision tasks that work best with a linear image state, such as image deblurring and image dehazing. Unfortunately, the vast majority of images are saved in the nonlinear image state. Because of this, a number of methods have been proposed to "unprocess" nonlinear images back to a raw-RGB state. However, existing unprocessing methods have a drawback because raw-RGB images are sensor-specific. As a result, it is necessary to know which camera produced the sRGB output and use a method or network tailored for that sensor to properly unprocess it. This paper addresses this limitation by exploiting another camera image state that is not available as an output, but it is available inside the camera pipeline. In particular, cameras apply a colorimetric conversion step to convert the raw-RGB image to a device-independent space based on the CIE XYZ color space before they apply the nonlinear photo-finishing. Leveraging this canonical image state, we propose a deep learning framework, CIE XYZ Net, that can unprocess a nonlinear image back to the canonical CIE XYZ image. This image can then be processed by any low-level computer vision operator and re-rendered back to the nonlinear image. We demonstrate the usefulness of the CIE XYZ Net on several low-level vision tasks and show significant gains that can be obtained by this processing framework. Code and dataset are publicly available at https://github.com/mahmoudnafifi/CIE_XYZ_NET. △ Less

Submitted 22 June, 2020; originally announced June 2020.

arXiv:2005.04117 [pdf, other]

NTIRE 2020 Challenge on Real Image Denoising: Dataset, Methods and Results

Authors: Abdelrahman Abdelhamed, Mahmoud Afifi, Radu Timofte, Michael S. Brown, Yue Cao, Zhilu Zhang, Wangmeng Zuo, Xiaoling Zhang, Jiye Liu, Wendong Chen, Changyuan Wen, Meng Liu, Shuailin Lv, Yunchao Zhang, Zhihong Pan, Baopu Li, Teng Xi, Yanwen Fan, Xiyu Yu, Gang Zhang, Jingtuo Liu, Junyu Han, Errui Ding, Songhyun Yu, Bumjun Park , et al. (65 additional authors not shown)

Abstract: This paper reviews the NTIRE 2020 challenge on real image denoising with focus on the newly introduced dataset, the proposed methods and their results. The challenge is a new version of the previous NTIRE 2019 challenge on real image denoising that was based on the SIDD benchmark. This challenge is based on a newly collected validation and testing image datasets, and hence, named SIDD+. This chall… ▽ More This paper reviews the NTIRE 2020 challenge on real image denoising with focus on the newly introduced dataset, the proposed methods and their results. The challenge is a new version of the previous NTIRE 2019 challenge on real image denoising that was based on the SIDD benchmark. This challenge is based on a newly collected validation and testing image datasets, and hence, named SIDD+. This challenge has two tracks for quantitatively evaluating image denoising performance in (1) the Bayer-pattern rawRGB and (2) the standard RGB (sRGB) color spaces. Each track ~250 registered participants. A total of 22 teams, proposing 24 methods, competed in the final phase of the challenge. The proposed methods by the participating teams represent the current state-of-the-art performance in image denoising targeting real noisy images. The newly collected SIDD+ datasets are publicly available at: https://bit.ly/siddplus_data. △ Less

Submitted 8 May, 2020; originally announced May 2020.

arXiv:2004.01354 [pdf, other]

Deep White-Balance Editing

Authors: Mahmoud Afifi, Michael S. Brown

Abstract: We introduce a deep learning approach to realistically edit an sRGB image's white balance. Cameras capture sensor images that are rendered by their integrated signal processor (ISP) to a standard RGB (sRGB) color space encoding. The ISP rendering begins with a white-balance procedure that is used to remove the color cast of the scene's illumination. The ISP then applies a series of nonlinear color… ▽ More We introduce a deep learning approach to realistically edit an sRGB image's white balance. Cameras capture sensor images that are rendered by their integrated signal processor (ISP) to a standard RGB (sRGB) color space encoding. The ISP rendering begins with a white-balance procedure that is used to remove the color cast of the scene's illumination. The ISP then applies a series of nonlinear color manipulations to enhance the visual quality of the final sRGB image. Recent work by [3] showed that sRGB images that were rendered with the incorrect white balance cannot be easily corrected due to the ISP's nonlinear rendering. The work in [3] proposed a k-nearest neighbor (KNN) solution based on tens of thousands of image pairs. We propose to solve this problem with a deep neural network (DNN) architecture trained in an end-to-end manner to learn the correct white balance. Our DNN maps an input image to two additional white-balance settings corresponding to indoor and outdoor illuminations. Our solution not only is more accurate than the KNN approach in terms of correcting a wrong white-balance setting but also provides the user the freedom to edit the white balance in the sRGB image to other illumination settings. △ Less

Submitted 2 April, 2020; originally announced April 2020.

Comments: Accepted as Oral at CVPR 2020

arXiv:2003.11596 [pdf, other]

Learning Multi-Scale Photo Exposure Correction

Authors: Mahmoud Afifi, Konstantinos G. Derpanis, Björn Ommer, Michael S. Brown

Abstract: Capturing photographs with wrong exposures remains a major source of errors in camera-based imaging. Exposure problems are categorized as either: (i) overexposed, where the camera exposure was too long, resulting in bright and washed-out image regions, or (ii) underexposed, where the exposure was too short, resulting in dark regions. Both under- and overexposure greatly reduce the contrast and vis… ▽ More Capturing photographs with wrong exposures remains a major source of errors in camera-based imaging. Exposure problems are categorized as either: (i) overexposed, where the camera exposure was too long, resulting in bright and washed-out image regions, or (ii) underexposed, where the exposure was too short, resulting in dark regions. Both under- and overexposure greatly reduce the contrast and visual appeal of an image. Prior work mainly focuses on underexposed images or general image enhancement. In contrast, our proposed method targets both over- and underexposure errors in photographs. We formulate the exposure correction problem as two main sub-problems: (i) color enhancement and (ii) detail enhancement. Accordingly, we propose a coarse-to-fine deep neural network (DNN) model, trainable in an end-to-end manner, that addresses each sub-problem separately. A key aspect of our solution is a new dataset of over 24,000 images exhibiting the broadest range of exposure values to date with a corresponding properly exposed image. Our method achieves results on par with existing state-of-the-art methods on underexposed images and yields significant improvements for images suffering from overexposure errors. △ Less

Submitted 30 March, 2021; v1 submitted 25 March, 2020; originally announced March 2020.

Comments: CVPR 2021

arXiv:1912.06960 [pdf, other]

What Else Can Fool Deep Learning? Addressing Color Constancy Errors on Deep Neural Network Performance

Authors: Mahmoud Afifi, Michael S Brown

Abstract: There is active research targeting local image manipulations that can fool deep neural networks (DNNs) into producing incorrect results. This paper examines a type of global image manipulation that can produce similar adverse effects. Specifically, we explore how strong color casts caused by incorrectly applied computational color constancy - referred to as white balance (WB) in photography - nega… ▽ More There is active research targeting local image manipulations that can fool deep neural networks (DNNs) into producing incorrect results. This paper examines a type of global image manipulation that can produce similar adverse effects. Specifically, we explore how strong color casts caused by incorrectly applied computational color constancy - referred to as white balance (WB) in photography - negatively impact the performance of DNNs targeting image segmentation and classification. In addition, we discuss how existing image augmentation methods used to improve the robustness of DNNs are not well suited for modeling WB errors. To address this problem, a novel augmentation method is proposed that can emulate accurate color constancy degradation. We also explore pre-processing training and testing images with a recent WB correction algorithm to reduce the effects of incorrectly white-balanced images. We examine both augmentation and pre-processing strategies on different datasets and demonstrate notable improvements on the CIFAR-10, CIFAR-100, and ADE20K datasets. △ Less

Submitted 14 December, 2019; originally announced December 2019.

Comments: ICCV 2019

arXiv:1912.06888 [pdf, other]

Sensor-Independent Illumination Estimation for DNN Models

Authors: Mahmoud Afifi, Michael S. Brown

Abstract: While modern deep neural networks (DNNs) achieve state-of-the-art results for illuminant estimation, it is currently necessary to train a separate DNN for each type of camera sensor. This means when a camera manufacturer uses a new sensor, it is necessary to retrain an existing DNN model with training images captured by the new sensor. This paper addresses this problem by introducing a novel senso… ▽ More While modern deep neural networks (DNNs) achieve state-of-the-art results for illuminant estimation, it is currently necessary to train a separate DNN for each type of camera sensor. This means when a camera manufacturer uses a new sensor, it is necessary to retrain an existing DNN model with training images captured by the new sensor. This paper addresses this problem by introducing a novel sensor-independent illuminant estimation framework. Our method learns a sensor-independent working space that can be used to canonicalize the RGB values of any arbitrary camera sensor. Our learned space retains the linear property of the original sensor raw-RGB space and allows unseen camera sensors to be used on a single DNN model trained on this working space. We demonstrate the effectiveness of this approach on several different camera sensors and show it provides performance on par with state-of-the-art methods that were trained per sensor. △ Less

Submitted 14 December, 2019; originally announced December 2019.

Journal ref: BMVC 2019

arXiv:1905.06653 [pdf]

Robust Real-time Pedestrian Detection in Aerial Imagery on Jetson TX2

Authors: Mohamed Afifi, Yara Ali, Karim Amer, Mahmoud Shaker, Mohamed ElHelw

Abstract: Detection of pedestrians in aerial imagery captured by drones has many applications including intersection monitoring, patrolling, and surveillance, to name a few. However, the problem is involved due to continuouslychanging camera viewpoint and object appearance as well as the need for lightweight algorithms to run on on-board embedded systems. To address this issue, the paper proposes a framewor… ▽ More Detection of pedestrians in aerial imagery captured by drones has many applications including intersection monitoring, patrolling, and surveillance, to name a few. However, the problem is involved due to continuouslychanging camera viewpoint and object appearance as well as the need for lightweight algorithms to run on on-board embedded systems. To address this issue, the paper proposes a framework for pedestrian detection in videos based on the YOLO object detection network [6] while having a high throughput of more than 5 FPS on the Jetson TX2 embedded board. The framework exploits deep learning for robust operation and uses a pre-trained model without the need for any additional training which makes it flexible to apply on different setups with minimum amount of tuning. The method achieves ~81 mAP when applied on a sample video from the Embedded Real-Time Inference (ERTI) Challenge where pedestrians are monitored by a UAV. △ Less

Submitted 16 May, 2019; originally announced May 2019.

arXiv:1802.01009 [pdf, other]

Image Posterization Using Fuzzy Logic and Bilateral Filter

Authors: Mahmoud Afifi

Abstract: Image posterization is converting images with a large number of tones into synthetic images with distinct flat areas and a fewer number of tones. In this technical report, we present the implementation and results of using fuzzy logic in order to generate a posterized image in a simple and fast way. The image filter is based on fuzzy logic and bilateral filtering; where, the given image is blurred… ▽ More Image posterization is converting images with a large number of tones into synthetic images with distinct flat areas and a fewer number of tones. In this technical report, we present the implementation and results of using fuzzy logic in order to generate a posterized image in a simple and fast way. The image filter is based on fuzzy logic and bilateral filtering; where, the given image is blurred to remove small details. Then, the fuzzy logic is used to classify each pixel into one of three specific categories in order to reduce the number of colors. This filter was developed during building the Specs on Face dataset in order to add a new level of difficulty to the original face images in the dataset. This filter does not hurt the human detection performance; however, it is considered a hindrance evading the face detection process. This filter can be used generally for posterizing images, especially those have a high contrast to get images with vivid colors. △ Less

Submitted 3 February, 2018; originally announced February 2018.

arXiv:1802.00153 [pdf, other]

Semantic White Balance: Semantic Color Constancy Using Convolutional Neural Network

Authors: Mahmoud Afifi

Abstract: The goal of computational color constancy is to preserve the perceptive colors of objects under different lighting conditions by removing the effect of color casts caused by the scene's illumination. With the rapid development of deep learning based techniques, significant progress has been made in image semantic segmentation. In this work, we exploit the semantic information together with the col… ▽ More The goal of computational color constancy is to preserve the perceptive colors of objects under different lighting conditions by removing the effect of color casts caused by the scene's illumination. With the rapid development of deep learning based techniques, significant progress has been made in image semantic segmentation. In this work, we exploit the semantic information together with the color and spatial information of the input image in order to remove color casts. We train a convolutional neural network (CNN) model that learns to estimate the illuminant color and gamma correction parameters based on the semantic information of the given image. Experimental results show that feeding the CNN with the semantic information leads to a significant improvement in the results by reducing the error by more than 40%. △ Less

Submitted 31 May, 2019; v1 submitted 31 January, 2018; originally announced February 2018.

Comments: Deep Learning and Reinforcement Learning Summer School (DLR), CIFAR - Vector Institute, (Poster sessions), 2018

arXiv:1711.04322 [pdf, other]

11K Hands: Gender recognition and biometric identification using a large dataset of hand images

Authors: Mahmoud Afifi

Abstract: The human hand possesses distinctive features which can reveal gender information. In addition, the hand is considered one of the primary biometric traits used to identify a person. In this work, we propose a large dataset of human hand images (dorsal and palmar sides) with detailed ground-truth information for gender recognition and biometric identification. Using this dataset, a convolutional ne… ▽ More The human hand possesses distinctive features which can reveal gender information. In addition, the hand is considered one of the primary biometric traits used to identify a person. In this work, we propose a large dataset of human hand images (dorsal and palmar sides) with detailed ground-truth information for gender recognition and biometric identification. Using this dataset, a convolutional neural network (CNN) can be trained effectively for the gender recognition task. Based on this, we design a two-stream CNN to tackle the gender recognition problem. This trained model is then used as a feature extractor to feed a set of support vector machine classifiers for the biometric identification task. We show that the dorsal side of hand images, captured by a regular digital camera, convey effective distinctive features similar to, if not better, those available in the palmar hand images. To facilitate access to the proposed dataset and replication of our experiments, the dataset, trained CNN models, and Matlab source code are available at (https://goo.gl/rQJndd). △ Less

Submitted 16 September, 2018; v1 submitted 12 November, 2017; originally announced November 2017.

arXiv:1711.00972 [pdf, other]

The Achievement of Higher Flexibility in Multiple Choice-based Tests Using Image Classification Techniques

Authors: Mahmoud Afifi, Khaled F. Hussain

Abstract: In spite of the high accuracy of the existing optical mark reading (OMR) systems and devices, a few restrictions remain existent. In this work, we aim to reduce the restrictions of multiple choice questions (MCQ) within tests. We use an image registration technique to extract the answer boxes from answer sheets. Unlike other systems that rely on simple image processing steps to recognize the extra… ▽ More In spite of the high accuracy of the existing optical mark reading (OMR) systems and devices, a few restrictions remain existent. In this work, we aim to reduce the restrictions of multiple choice questions (MCQ) within tests. We use an image registration technique to extract the answer boxes from answer sheets. Unlike other systems that rely on simple image processing steps to recognize the extracted answer boxes, we address the problem from another perspective by training a machine learning classifier to recognize the class of each answer box (i.e., confirmed, crossed out, or blank answer). This gives us the ability to deal with a variety of shading and mark patterns, and distinguish between chosen (i.e., confirmed) and canceled answers (i.e., crossed out). All existing machine learning techniques require a large number of examples in order to train a model for classification, therefore we present a dataset including six real MCQ assessments with different answer sheet templates. We evaluate two strategies of classification: a straight-forward approach and a two-stage classifier approach. We test two handcrafted feature methods and a convolutional neural network. In the end, we present an easy-to-use graphical user interface of the proposed system. Compared with existing OMR systems, the proposed system has the least constraints and achieves a high accuracy. We believe that the presented work will further direct the development of OMR systems towards reducing the restrictions of the MCQ tests. △ Less

Submitted 11 January, 2019; v1 submitted 2 November, 2017; originally announced November 2017.

arXiv:1709.07720 [pdf, other]

Can We Boost the Power of the Viola-Jones Face Detector Using Pre-processing? An Empirical Study

Authors: Mahmoud Afifi, Marwa Nasser, Mostafa Korashy, Katherine Rohde, Aly Abdelrahim

Abstract: The Viola-Jones face detection algorithm was (and still is) a quite popular face detector. In spite of the numerous face detection techniques that have been recently presented, there are many research works that are still based on the Viola-Jones algorithm because of its simplicity. In this paper, we study the influence of a set of blind pre-processing methods on the face detection rate using the… ▽ More The Viola-Jones face detection algorithm was (and still is) a quite popular face detector. In spite of the numerous face detection techniques that have been recently presented, there are many research works that are still based on the Viola-Jones algorithm because of its simplicity. In this paper, we study the influence of a set of blind pre-processing methods on the face detection rate using the Viola-Jones algorithm. We focus on two aspects of improvement, specifically badly illuminated faces and blurred faces. Many methods for lighting invariant and deblurring are used in order to improve the detection accuracy. We want to avoid using blind pre-processing methods that may obstruct the face detector. To that end, we perform two sets of experiments. The first set is performed to avoid any blind pre-processing method that may hurt the face detector. The second set is performed to study the effect of the selected pre-processing methods on images that suffer from hard conditions. We present two manners of applying the pre-processing method to the image prior to being used by the Viola-Jones face detector. Four different datasets are used to draw a coherent conclusion about the potential improvement caused by using prior enhanced images. The results demonstrate that some of the pre-processing methods may hurt the accuracy of Viola-Jones face detection algorithm. However, other pre-processing methods have an evident positive impact on the accuracy of the face detector. Overall, we recommend three simple and fast blind photometric normalization methods as a pre-processing step in order to improve the accuracy of the pre-trained Viola-Jones face detector. △ Less

Submitted 10 December, 2017; v1 submitted 22 September, 2017; originally announced September 2017.

Comments: 14 pages, 10 figures, 8 tables

arXiv:1706.04277 [pdf, other]

AFIF4: Deep Gender Classification based on AdaBoost-based Fusion of Isolated Facial Features and Foggy Faces

Authors: Mahmoud Afifi, Abdelrahman Abdelhamed

Abstract: Gender classification aims at recognizing a person's gender. Despite the high accuracy achieved by state-of-the-art methods for this task, there is still room for improvement in generalized and unrestricted datasets. In this paper, we advocate a new strategy inspired by the behavior of humans in gender recognition. Instead of dealing with the face image as a sole feature, we rely on the combinatio… ▽ More Gender classification aims at recognizing a person's gender. Despite the high accuracy achieved by state-of-the-art methods for this task, there is still room for improvement in generalized and unrestricted datasets. In this paper, we advocate a new strategy inspired by the behavior of humans in gender recognition. Instead of dealing with the face image as a sole feature, we rely on the combination of isolated facial features and a holistic feature which we call the foggy face. Then, we use these features to train deep convolutional neural networks followed by an AdaBoost-based score fusion to infer the final gender class. We evaluate our method on four challenging datasets to demonstrate its efficacy in achieving better or on-par accuracy with state-of-the-art methods. In addition, we present a new face dataset that intensifies the challenges of occluded faces and illumination changes, which we believe to be a much-needed resource for gender classification research. △ Less

Submitted 17 November, 2017; v1 submitted 13 June, 2017; originally announced June 2017.

Comments: 26 pages, 7 figures, 7 tables

arXiv:1706.03867 [pdf, other]

doi 10.1117/1.JEI.26.6.060501

Can We See Photosynthesis? Magnifying the Tiny Color Changes of Plant Green Leaves Using Eulerian Video Magnification

Authors: Islam A. T. F. Taj-Eddin, Mahmoud Afifi, Mostafa Korashy, Ali H. Ahmed, Ng Yoke Cheng, Evelyng Hernandez, Salma M. Abdel-latif

Abstract: Plant aliveness is proven through laboratory experiments and special scientific instruments. In this paper, we aim to detect the degree of animation of plants based on the magnification of the small color changes in the plant's green leaves using the Eulerian video magnification. Capturing the video under a controlled environment, e.g., using a tripod and direct current (DC) light sources, reduces… ▽ More Plant aliveness is proven through laboratory experiments and special scientific instruments. In this paper, we aim to detect the degree of animation of plants based on the magnification of the small color changes in the plant's green leaves using the Eulerian video magnification. Capturing the video under a controlled environment, e.g., using a tripod and direct current (DC) light sources, reduces camera movements and minimizes light fluctuations; we aim to reduce the external factors as much as possible. The acquired video is then stabilized and a proposed algorithm used to reduce the illumination variations. Lastly, the Euler magnification is utilized to magnify the color changes on the light invariant video. The proposed system does not require any special purpose instruments as it uses a digital camera with a regular frame rate. The results of magnified color changes on both natural and plastic leaves show that the live green leaves have color changes in contrast to the plastic leaves. Hence, we can argue that the color changes of the leaves are due to biological operations, such as photosynthesis. To date, this is possibly the first work that focuses on interpreting visually, some biological operations of plants without any special purpose instruments. △ Less

Submitted 29 August, 2017; v1 submitted 12 June, 2017; originally announced June 2017.

Comments: 7 pages, 3 figures

Journal ref: J. Electron. Imaging, 2017

arXiv:1602.08472 [pdf, ps, other]

ExpSOS: Secure and Verifiable Outsourcing of Exponentiation Operations for Mobile Cloud Computing

Authors: Kai Zhou, M. H. Afifi, Jian Ren

Abstract: Discrete exponential operation, such as modular exponentiation and scalar multiplication on elliptic curves, is a basic operation of many public-key cryptosystems. However, the exponential operations are considered prohibitively expensive for resource-constrained mobile devices. In this paper, we address the problem of secure outsourcing of exponentiation operations to one single untrusted server.… ▽ More Discrete exponential operation, such as modular exponentiation and scalar multiplication on elliptic curves, is a basic operation of many public-key cryptosystems. However, the exponential operations are considered prohibitively expensive for resource-constrained mobile devices. In this paper, we address the problem of secure outsourcing of exponentiation operations to one single untrusted server. Our proposed scheme (ExpSOS) only requires very limited number of modular multiplications at local mobile environment thus it can achieve impressive computational gain. ExpSOS also provides a secure verification scheme with probability approximately 1 to ensure that the mobile end-users can always receive valid results. The comprehensive analysis as well as the simulation results in real mobile device demonstrates that our proposed ExpSOS can significantly improve the existing schemes in efficiency, security and result verifiability. We apply ExpSOS to securely outsource several cryptographic protocols to show that ExpSOS is widely applicable to many cryptographic computations. △ Less

Submitted 26 February, 2016; originally announced February 2016.

Comments: 28 pages, journal paper

arXiv:1005.4005 [pdf]

Optical phase extraction algorithm based on the continuous wavelet and the Hilbert transforms

Authors: Mustapha Bahich, Mohamed Afifi, Elmostafa Barj

Abstract: In this paper we present an algorithm for optical phase evaluation based on the wavelet transform technique. The main advantage of this method is that it requires only one fringe pattern. This algorithm is based on the use of a second π/2 phase shifted fringe pattern where it is calculated via the Hilbert transform. To test its validity, the algorithm was used to demodulate a simulated fringe patt… ▽ More In this paper we present an algorithm for optical phase evaluation based on the wavelet transform technique. The main advantage of this method is that it requires only one fringe pattern. This algorithm is based on the use of a second π/2 phase shifted fringe pattern where it is calculated via the Hilbert transform. To test its validity, the algorithm was used to demodulate a simulated fringe pattern giving the phase distribution with a good accuracy. △ Less

Submitted 21 May, 2010; originally announced May 2010.

Comments: www.journalofcomputing.org

Journal ref: Journal of Computing, Volume 2, Issue 5, May 2010

Showing 1–45 of 45 results for author: Afifi, M