# polarizationaware_lowlight_image_enhancement__d97f0d88.pdf

Polarization-Aware Low-Light Image Enhancement

Chu Zhou1, Minggui Teng2, Youwei Lyu3, Si Li3, Chao Xu1, Boxin Shi2*

1Key Laboratory of Machine Perception (MOE), School of Intelligence Science and Technology, Peking University 2National Engineering Research Center of Visual Technology, School of Computer Science, Peking University 3School of Artificial Intelligence, Beijing University of Posts and Telecommunications

Polarization-based vision algorithms have found uses in various applications since polarization provides additional physical constraints. However, in low-light conditions, their performance would be severely degenerated since the captured polarized images could be noisy, leading to noticeable degradation in the degree of polarization (Do P) and the angle of polarization (Ao P). Existing low-light image enhancement methods cannot handle the polarized images well since they operate in the intensity domain, without effectively exploiting the information provided by polarization. In this paper, we propose a Stokes-domain enhancement pipeline along with a dual-branch neural network to handle the problem in a polarization-aware manner. Two application scenarios (reflection removal and shape from polarization) are presented to show how our enhancement can improve their results.

Introduction Exploring polarimetric properties of light transport has benefited various vision applications, such as reflection removal (Lei et al. 2020), shape from polarization (Deschaintre, Lin, and Ghosh 2021), image dehazing (Zhou et al. 2021), etc. Since these applications often need to take full advantage of the physical constraints provided by the unique cues of polarization, their accuracy is closely related to polarizationrelevant parameters, such as the degree of polarization (Do P) and the angle of polarization (Ao P) of the incoming light to the sensor. With the development of polarization cameras, capturing multiple polarized images of the same scene with different polarizer angles in a snapshot becomes possible, which brings convenience to the acquisition of the Do P and Ao P. However, when taking photos in low-light conditions (e.g., capturing with limited illumination or setting a short exposure time at a high frame rate), the signal-to-noise ratio (SNR) degenerates due to low photon counts. In such a situation, since the captured polarized images are noisy, leading to severely degenerated Do P and Ao P, the performance of corresponding applications is negatively affected (Hu et al. 2020). Therefore, it is of great interest to enhance multiple polarized low-light images of the same scene for acquiring the Do P and Ao P accurately.

*Corresponding author: shiboxin@pku.edu.cn Copyright 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Recent advances in single image low-light enhancement (Chen et al. 2018; Jiang et al. 2021; Zheng, Shi, and Shi 2021; Guo et al. 2020) have shown effectiveness in imaging in low-light conditions. They adopt neural networks to enhance the visual quality of images in the intensity domain by extracting image features and priors from a large amount of training data. However, when it comes to handling multiple polarized low-light images of the same scene, they can only process one of them at a time in a polarization-unaware manner (i.e., without considering the physical constraints among the polarized images), so that the quality of the acquired Do P and Ao P are rather unreliable. This situation could be relieved by processing multiple polarized low-light images simultaneously (e.g., using IPLNet (Hu et al. 2020)), however, operating in the intensity domain cannot make effective use of the information provided by polarization. Based on the fact that the polarization characteristics of the incoming light to the sensor are fully encoded in the Stokes parameters, we analyze the average error rates of polarization-relevant variables (including the polarized images, Ao P, Do P, and Stokes parameters) w.r.t. the image irradiacne reduction factor (caused by decreasing the exposure time or scene radiance) in the polarized low-light image formation model. We observe that the degradation in the Stokes parameters is less severe than the polarized images in low-light conditions, and therefore propose a new pipeline to solve the problem in the Stokes domain instead of the intensity domain. Along this pipeline, we design a dual-branch neural network based on the specific properties of the Stokes parameters to perform enhancement in a polarization-aware manner. We also present two application scenarios, including reflection removal (Lei et al. 2020) and shape from polarization (Deschaintre, Lin, and Ghosh 2021), to demonstrate the benefits of enhancing polarized low-light images. To summarize, this paper makes contributions by proposing: (1) a Stokes-domain enhancement pipeline for polarized low-light images; (2) a polarization-aware dual-branch network tailored to the pipeline; (3) two applications demonstrating the benefits of enhancing polarized low-light images.

Related Work Generally, low-light image enhancement methods could be divided into two categories: traditional methods and

The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)

learning-based methods. Traditional methods often utilize histogram equalization (Pizer et al. 1987) and Retinex theory (Land 1977) to turn the low-light image enhancement problem into a numerical optimization problem. However, the methods using histogram equalization may cause the problem of overand under-enhancement since they often do not take the illumination into consideration, and the methods using Retinex theory often ignore noise removal and may amplify the noise. To increase the robustness, learning-based methods have been proposed. Existing learning-based methods are mainly based on supervised learning. They use a large amount of training data to learn the mapping from lowlight images to normal-light images (Lore, Akintayo, and Sarkar 2017; Ren et al. 2019; Xu et al. 2020; Li et al. 2020; Lv, Liu, and Lu 2020; Lim and Kim 2020; Wang et al. 2020; Lu and Zhang 2020; Atoum et al. 2020; Ai and Kwon 2020; Li, Feng, and Hua 2021; Zheng, Shi, and Shi 2021; Lv, Li, and Lu 2021), estimate reflectance and illumination maps (Wei et al. 2018; Li et al. 2018; Wang et al. 2019a; Zhang, Zhang, and Guo 2019; Wang et al. 2019b; Fan et al. 2020; Yang et al. 2021b; Zhang et al. 2021b) based on the Retinex theory (Land 1977), or reconstruct enhanced images from raw low-light images directly (Chen et al. 2018; Maharjan et al. 2019; Zhu et al. 2020b; Wei et al. 2020; Lamba and Mitra 2021). Recently, unsupervised learning (Jiang et al. 2021), semi-supervised learning (Yang et al. 2020, 2021a), reinforcement learning (Yu et al. 2018), and zero-shot learning (Zhang et al. 2019; Guo et al. 2020; Zhu et al. 2020a; Liu et al. 2021; Li, Guo, and Chen 2021; Zhao et al. 2021) have also been introduced to solve this challenging problem. To deal with dynamic scenes, learning temporal stability has been considered in some low-light video enhancement methods (Lv et al. 2018; Chen et al. 2019; Jiang and Zheng 2019; Triantafyllidou et al. 2020; Zhang et al. 2021a; Wang et al. 2021). Although these methods have shown effectiveness in a large variety of scenes, they are not suitable for enhancing multiple polarized images since they only focus on enhancing the quality of a single input image and cannot consider the polarization relationship among multiple polarized images. To deal with this problem, Hu et al. (2020) proposed a network, named IPLNet, to enhance multiple polarized lowlight images simultaneously. However, it still tried to handle the problem in the intensity domain, which cannot make effective use of the information provided by polarization.

Method Polarized Low-Light Image Formation Model In normal-light conditions, assuming the camera response function is linear (Lyu et al. 2019; Hu et al. 2020; Zhou et al. 2021), the formation of an image can be described as

I = R t, (1)

where R denotes the original scene radiance, and t is the sensor exposure time. Note that since in normal-light conditions the SNR of the captured image is sufficiently high, we ignore the noise term (Chen et al. 2018; Hu et al. 2020). When placing a polarizer with polarizer angle α in front of the camera, according to the Malus law (Hecht 2012), the

captured polarized image Iα can be calculated as

2I (1 p cos(2(α θ))), (2)

where p [0, 1] and θ [0, π] denote the Do P and Ao P of the incoming light to the sensor respectively. Reformulating Eq. (2) into a polynomial form, Iα can be expressed as a linear combination of three parameters S0,1,2:

2 cos(2α) S1 1

2 sin(2α) S2, where

S0 = I, S1 = I p cos(2θ), and S2 = I p sin(2θ) (3) are called the Stokes parameters (K onnen 1985) of the incoming light to the sensor. Once S0,1,2 are available (from polarized images), p and θ could be easily acquired by

S2 1 + S2 2 S0 and θ = 1

2 arctan(S2

A polarization camera can capture four spatially-aligned and temporally-synchronized polarized images Iα1,2,3,4 with different polarizer angles α1,2,3,4 = 0 , 45 , 90 , 135 in a snapshot1, which brings convenience to the acquisition of p and θ (Fig. 1 (a)). This is because according to the physical meanings of the Stokes parameters2, S0,1,2 can be computed from Iα1,2,3,4 directly:

2(Iα1 + Iα2 + Iα3 + Iα4),

S1 = Iα3 Iα1, and S2 = Iα4 Iα2. (5)

However, when it comes to low-light conditions, the SNR of the captured polarized images degenerates due to low photon counts, so that the noise term cannot be ignored anymore. The captured polarized low-light images would be degenerated as (we useˆ to denote the degenerated variables):

γ Iαi + Ni (i = 1, 2, 3, 4), (6)

where γ (γ > 1) is a linear scaling factor denoting the image irradiacne reduction caused by decreasing the exposure time or scene radiance, and Ni = N( 1

γ , Iαi) stands for a noise term which is mainly affected by 1

γ (Lv, Li, and Lu 2021). Therefore, the computed Stokes parameters and the acquired Ao P and Do P would be degenerated correspondingly (Fig. 1 (b)). Error Rate Analysis. To show that the polarized images, Stokes parameters, Do P, and Ao P have different sensitivity to γ, we analyze the relationships between their average error rates and γ. Defining the average error rate of a variable x (x can be a polarized image or a Stokes parameter normalized to [0, 1]) as

Ex = P |γˆx x| P x , (7)

1We do not consider the non-linearity in this paper since a polarization camera usually outputs images with a linear camera response function. 2S0 describes the total intensity of the light, and S1 (S2) describes the difference between the intensity of the vertical (135 ) and horizontal (45 ) polarized light (K onnen 1985).

where P denotes the pixel-wise sum. We can see that Ex can be regarded as the functions of γ given a specific scene (i.e., Iα1,2,3,4). Combining Eq. (6) and Eq. (7), the relationship between EIavg (Iavg = P4 i=1 Iαi/4) and γ can be written as

EIavg = γ P |Navg| P Iavg = γ K0(γ), where

Navg = P4 i=1 Ni

4 and K0(γ) = P |Navg| P Iavg . (8)

Similarly, combining Eq. (5) and Eq. (7), we obtain the relationship between ESi and γ:

ES0 = γ P |2Navg| P 2Iavg = EIavg = γ K0(γ),

ES1 = γ K1(γ), and ES2 = γ K2(γ), where

K1(γ) = P |N3 N1| P S1 and K2(γ) = P |N4 N2| P S2 .

(9) Since K1(γ) and K2(γ) have similar formulations (the numerator is the difference between two noise terms following the same distribution, and the denominator is the difference (normalized to [0, 1]) between the intensity of two polarized images), and the numerators of K1(γ) and K2(γ) are close to zero, we could approximately derive K1(γ) K2(γ) < K0(γ), which means

EIavg = ES0 > ES1 ES2. (10)

Similar to Eq. (7), we define the average error rates of the Do P and Ao P (also normalized to [0, 1]) as Ep = P |ˆp p| P p and Eθ = P | ˆθ θ| P θ respectively3. According to Eq. (4), the relationship between Ep (or Eθ) and γ could be quite complicated. We therefore suspect the Do P and Ao P are more sensitive to γ than the polarized images and the Stokes parameters. To verify it, we perform simulation on 6000 synthetic scenes to quantitatively obtain the relationships between the average error rates (EIavg, ES0,1,2, Ep, Eθ) and γ, as show in Fig. 1 (c)4. We can see that in low-light conditions the degradation in the Do P and Ao P is quite noticeable, which can lead to degenerated performance of applications of polarization-based vision (Lei et al. 2020; Deschaintre, Lin, and Ghosh 2021).

Stokes-Domain Enhancement Pipeline We aim to enhance multiple polarized images captured by a polarization camera in low-light conditions to acquire the Do P and Ao P with high accuracy. Directly denoising ˆp and ˆθ seems a straightforward way to achieve our goal. However, the methods designed for image denoising cannot handle this problem since the noise distributions of the Do P and

3Note that there is no need to multiply γ to ˆp or ˆθ like Eq. (7), since according to Eq. (4) 1

γ will be canceled in the division operation. 4Details of this simulation experiment can be found in the supplementary material.

Do P Polarized images

Polarized images

Normal-light Low-light

1 2 3 4 5 6 7 8 9 10

Average error rates

Linear scaling factor γ

ℰ𝐈𝐈avg and ℰ𝐒𝐒0 ℰ𝐒𝐒1and ℰ𝐒𝐒2

Figure 1: (a) Normal-light polarized images with the computed Do P and Ao P. (b) Low-light polarized images with the degenerated Do P and Ao P. (c) Relationships between the average error rates (EIavg, ES0,1,2, Ep, Eθ) and the linear scaling factor γ. We visualize the Do P and Ao P (normalized to [0, 1]) using color maps after averaging their RGB channels (as Hu et al. (2020) do) throughout this paper.

Ao P are inherently different from those in the image intensity domain (Hu et al. 2020). According to Eq. (5), a possible solution could be adopting single-image low-light enhancement methods (Jiang et al. 2021; Zheng, Shi, and Shi 2021; Guo et al. 2020) to reconstruct Iα1,2,3,4 from ˆIα1,2,3,4. However, these methods cannot consider the polarization relationship among the polarized images and can only process them in a frame-by-frame manner, leaving the acquired Do P and Ao P unreliable. Another solution is implicitly considering the physical constraints of polarization by enhancing ˆIα1,2,3,4 simultaneously (Hu et al. 2020). However, the information provided by polarization cannot be effectively exploited since it still operates in the intensity domain. From Eq. (3) we could know that the polarization characteristics of the incoming light to the sensor are fully encoded in the Stokes parameters S0,1,2, i.e., one can render a polarized image with an arbitrary polarizer angle from them. Besides, from Eq. (10) and Fig. 1 (c) we can see that the level of degradation in S0 is similar to the polarized images,

Linear scaling

Laplace edge

Backbone network (for 𝐒𝐒0)

Add Feature extraction

Feature extraction

Backbone network (for 𝐒𝐒1,2)

Backbone network

using Eq. (4)

Feature extraction

Input Output

Non-local block

Downsampling

Pre-amplified input (noisy)

Unpolarized branch 𝑓𝑓unpol

Differential branch 𝑓𝑓diff

Figure 2: We design a network tailored to our Stokes-domain enhancement pipeline, which recovers the Stokes parameters S0,1,2 from their degenerated counterparts ˆS0,1,2. It consists of two branches for enhancing ˆS0 and ˆS1,2 respectively based on their different properties. Then, the Do P p and Ao P θ with high accuracy can be computed from S0,1,2 using Eq. (4).

while S1,2 are less degenerated, which suggests that enhancing the Stokes parameters may have chance to provide more reliable results. Therefore, we propose to enhance ˆS0,1,2 instead of enhancing ˆIα1,2,3,4 directly. Here, it is not wise to process ˆS0,1,2 simultaneously since the properties of S0 and S1,2 are essentially different: (1) S0 is the unpolarized image (Eq. (3)); (2) S1,2 are two similar differential signals , which can be expressed as the differences between two polarized images (Eq. (5)). It might be a better idea to deal with ˆS0 and ˆS1,2 separately based on their different properties. Therefore, we propose a Stokes-domain enhancement pipeline, which adopts two network branches to perform enhancement on ˆS0 and ˆS1,2 independently:

S0 = funpol(ˆS0) and S1,2 = fdiff(ˆS1,2), (11)

where funpol denotes the unpolarized branch for enhancing the unpolarized image ˆS0 and fdiff denotes the differential branch for enhancing the differential signals ˆS1,2, which will be detailed in the next subsection. As S0,1,2 become available, the Do P p and Ao P θ with high accuracy can be calculated using Eq. (4), and the polarized images Iα1,2,3,4 can be calculated using Eq. (3).

Polarization-Aware Dual-Branch Network

Tailored to our Stokes-domain enhancement pipeline, we design a dual-branch network to enhance ˆS0 and ˆS1,2 indepen-

dently in a polarization-aware manner based on their different properties, as shown in Fig. 2. Unpolarized Branch. As shown in the upper branch of Fig. 2, it consists of a feature extraction block (Fa) to extract high-level features in the latent space, and a backbone network (g1) to perform blind noise suppression and detail enhancement on the pre-amplified input γˆS0. By adopting Tanh as the output activation, it learns the residual between γˆS0 and S0, which can be described as

S0 = funpol(ˆS0) = g1(Fa(γˆS0)) + γˆS0. (12)

Differential Branch. We observe that despite most regions in γˆS1,2 are degenerated by noise, their edges are less affected, which could provide abundant structure information. So, we propose to explicitly extract their edges (denoted as E1,2) using Laplace kernels as priors. As shown in the lower branch of Fig. 2, it consists of two feature extraction blocks (Fb and Fc) to extract features from γˆS1,2 and their edges respectively and a backbone network (g2) to complete the reconstruction. This branch can be described as (i = 1, 2)

Si = fdiff(ˆSi) = g2(concat(Fb(γˆSi), Fc(Ei))) + γˆSi. (13) Layer Details. The feature extraction blocks Fa and Fb consist of several convolution layers, which are quite simple since they only extract features directly from inputs. Fc consists of a convolution layer, a dense block (Huang et al.

Ground truth Input Ours IPLNet Enlighten GAN UTVNet Zero-DCE

P:30.02 S:0.846

P:26.88 S:0.783

P:26.22 S:0.694

P:24.87 S:0.613

P:20.25 S:0.578

P:18.34 S:0.414

P:16.91 S:0.323

P:14.91 S:0.223

P:11.06 S:0.246

P:12.42 S:0.134

P:38.78 S:0.983

P:19.84 S:0.930

P:20.37 S:0.933

P:20.84 S:0.913

P:15.59 S:0.834

Figure 3: Qualitative evaluation results on the PLIE dataset among our method, IPLNet (Hu et al. 2020), Enlighten GAN (Jiang et al. 2021), UTVNet (Zheng, Shi, and Shi 2021), and Zero-DCE (Guo et al. 2020). Quantitative results evaluated using PSNR (P) and SSIM (S) are displayed below each image. Please zoom-in for better details.

2017), and a non-local block (Wang et al. 2018) since it extracts features from sparse edges, which requires large receptive fields and long-range dependencies. As for the backbone network gi (i = 1, 2), we design it as a modified autoencoder architecture (Hinton and Salakhutdinov 2006), by virtue of its excellent context generalization ability for enriching detail contents. We set the number of downsampling/upsampling blocks to 2 for multi-scale observations, embed 3 dense blocks (Huang et al. 2017) in the coarsest layer for more fine-grained contextual information, and add skip-connections to make full use of the shallow features. The downsampling block is a residual bottleneck block (He et al. 2016) enhanced with channel shuffle operations (Zhang et al. 2018) to help the information flow across feature channels. The upsampling block is similar to the corresponding one in Attention U-Net architecture (Oktay et al. 2018). Note that we add an instance normalization (Ulyanov, Vedaldi, and Lempitsky 2016) layer and a Re LU activation function after each convolution layer.

Loss Function and Training Strategy. The total loss function of our network is defined as L(S0,1,2) = λ1 L1(S0,1,2) + λ2 L2(S0,1,2) + λ3 Ltv(S0,1,2) + λ4 Lgrad(S1,2), where L1 is the ℓ1 loss, L2 is the ℓ2 loss, Ltv

is the total variation loss to enforce smoothness, and Lgrad is the gradient loss (ℓ2 loss in the gradient domain) to ensure the structure invariance of S1,2. λi (i = 1, 2, 3, 4) are empirically set to be 10.0, 100.0, 1.0, 100.0 respectively. We implement the network using Py Torch on an NVIDIA 2080Ti GPU, and train it for 400 epochs using ADAM optimizer (Kingma and Ba 2014) with a batch size of 8. The learning rate is set to 0.01.

Experiments Comparisons with Existing Methods There is no public dataset for such a polarization-aware low-light image enhancement task. Besides, existing singleimage low-light enhancement benchmark datasets (e.g., (Lv, Li, and Lu 2021; Chen et al. 2018)) do not contain any polarization information, which cannot be used to generate polarized images. Therefore, we propose to build a real-world dataset, named PLIE (Polarization-aware Low-light Image Enhancement) dataset5, which contains pairwise lowand normal-light polarized images to train our network and test it quantitatively and qualitatively.

5More information can be found in the supplementary material.

PSNR-p SSIM-p PSNR-θ SSIM-θ PSNR-S0 SSIM-S0 Ours 27.15 0.765 16.42 0.336 39.19 0.977 IPLNet (Hu et al. 2020) 25.32 0.715 16.21 0.276 22.84 0.930 Enlighten GAN (Jiang et al. 2021) 24.55 0.652 13.55 0.190 22.14 0.887 UTVNet (Zheng, Shi, and Shi 2021) 24.14 0.636 12.18 0.271 18.45 0.821 Zero-DCE (Guo et al. 2020) 19.34 0.527 12.09 0.134 17.48 0.815

Table 1: Quantitative evaluation results on the PLIE dataset among our method, IPLNet (Hu et al. 2020), Enlighten GAN (Jiang et al. 2021), UTVNet (Zheng, Shi, and Shi 2021), and Zero-DCE (Guo et al. 2020). Bold font indicates the best performance.

PSNR-p SSIM-p PSNR-θ SSIM-θ PSNR-S0 SSIM-S0 Intensity-domain enhancement 24.98 0.702 16.38 0.286 38.00 0.972 Single-branch network 26.34 0.756 15.87 0.328 37.19 0.928 Without edge priors 26.13 0.751 16.26 0.330 39.17 0.975 Without gradient loss 20.63 0.485 14.67 0.181 37.36 0.973 Without total variation loss 15.70 0.576 14.35 0.250 37.40 0.970 Our complete model 27.15 0.765 16.42 0.336 39.19 0.977

Table 2: Quantitative evaluation results of ablation study.

We compare our method to IPLNet6 (Hu et al. 2020) (the only existing method designed for enhancing polarized lowlight images as far as we know), and three state-of-the-art single-image low-light enhancement methods including Enlighten GAN (Jiang et al. 2021), UTVNet (Zheng, Shi, and Shi 2021), and Zero-DCE (Guo et al. 2020) on the PLIE dataset. We do not compare to image denoising methods in this paper since IPLNet (Hu et al. 2020) has made such comparisons with Polarization-BM3D (Tibbs et al. 2018) (the state-of-the-art polarized image denoising method). Note that comparing with single-image low-light enhancement methods might be a bit unfair because of the difference in the way of processing the input data, and we just attempt to show the significance of polarization-awareness. As Hu et al. (2020) do, we not only evaluate the accuracy of the enhanced Do P p and Ao P θ, but also evaluate the quality of the enhanced unpolarized image S0, since one can render a polarized image with an arbitrary polarizer angle using Eq. (3) when p, θ, and S0 are available. Note that we only re-train IPLNet (Hu et al. 2020) on the PLIE dataset, while directly adopting the pre-trained models for those single-image low-light enhancement methods (Jiang et al. 2021; Zheng, Shi, and Shi 2021; Guo et al. 2020), since the performance of those single-image methods degenerates after re-training. This is because those methods rely strongly on the semantic information they extracted for enhancement; when training on the PLIE dataset (which is not as large as the dataset used for obtaining the pre-trained models) the semantic information is limited. Visual quality comparisons are shown in Fig. 37. As for p and θ, our method achieves much better performance than other methods, thanks to our polarization-aware network.

6The code of IPLNet (Hu et al. 2020) is not available and the demonstrated results are based on our own implementation according to the descriptions in the paper. 7Additional results can be found in the supplementary material.

Taking the left box of p as an example, the results of singleimage low-light enhancement methods (Jiang et al. 2021; Zheng, Shi, and Shi 2021; Guo et al. 2020) suffer severely from artifacts, and IPLNet (Hu et al. 2020) tends to generate over-smooth results. This is because the single-image lowlight enhancement methods (Jiang et al. 2021; Zheng, Shi, and Shi 2021; Guo et al. 2020) process the polarized images in a frame-by-frame manner so that they are not aware of the polarization relationship among the polarized images, while IPLNet (Hu et al. 2020) still handles the problem in the intensity domain so that the information provided by polarization cannot be effectively exploited. As for S0, our results resemble the ground truth more closely with less color distortion. This is because the methods operating in the intensity domain (Hu et al. 2020; Jiang et al. 2021; Zheng, Shi, and Shi 2021; Guo et al. 2020) need to compute S0 from Iα1,2,3,4, while our unpolarized branch focuses on enhancing S0 directly, which can avoid error accumulation during the computation. To evaluate the results quantitatively, as Hu et al. (2020) do, we adopt two frequently-used metrics including PSNR and SSIM. Results are shown in Tab. 1 (also below corresponding examples in Fig. 3). Our model consistently outperforms the compared methods on all metrics.

Ablation Study

We conduct a series of ablation studies and show comparisons in Tab. 2. We first show the contribution of our Stokesdomain enhancement pipeline by comparing with a model that performs enhancement in the intensity domain. We find that our Stokes-domain pipeline is better since it can make effective use of the information provided by polarization. We further verify the effectiveness of our dual-branch network structure (Fig. 2) by comparing with a model that uses only a single branch to estimate S0,1,2 simultaneously. From the results we can see that our network design is more reasonable and robust. Then, we demonstrate the necessity of extracting

P:26.49 S:0.936

P:25.83 S:0.935 P:25.48 S:0.892

Reflection-contaminated

input (pre-amplified)

Reflection-free

ground truth Ours IPLNet Before enhancement Results of reflection removal

P:26.49 S:0.936

Figure 4: Results of reflection removal (using PRRPAW (Lei et al. 2020)) before and after enhancement by our method and IPLNet (Hu et al. 2020). Quantitative results evaluated using PSNR (P) and SSIM (S) are displayed in each image.

P:25.42 S:0.913

MAE: 37.87 MAE: 23.98

Input scene (pre-amplified)

Ground truth normal map Ours IPLNet Before enhancement Results of shape from polarization

Figure 5: Results of shape from polarization (using DP3I (Deschaintre, Lin, and Ghosh 2021)) before and after enhancement by our method and IPLNet (Hu et al. 2020). Quantitative results evaluated using mean angle error (MAE) are displayed in each image (lower means better).

the edges of S1,2 as priors by removing them, and validate the significance of adopting gradient loss and total variation loss by removing them respectively. These results show that our complete model achieves the first performance with the proposed specific designs.

Applications

To demonstrate the benefits of enhancing polarized low-light images, we choose two typical applications including reflection removal (Lei et al. 2020) and shape from polarization (Deschaintre, Lin, and Ghosh 2021) and show that the enhancement can improve their performance. Note that at this point we only compare to IPLNet (Hu et al. 2020) due to the inferior performance of single-image low-light enhancement methods (Jiang et al. 2021; Zheng, Shi, and Shi 2021; Guo et al. 2020) on the PLIE dataset according to Tab. 1. Reflection Removal. We choose the state-of-the-art polarization-based reflection removal method PRRPAW (Lei et al. 2020) for validation. First, we use a Lucid Vision Phoenix polarization camera to capture reflectioncontaminated polarized low-light images behind a piece of glass with a short exposure time tshort and adopt our method and IPLNet (Hu et al. 2020) to enhance them respectively. Then, we take the enhanced images (converted to grayscale) as the input of PRRPAW (Lei et al. 2020) to obtain the reflection removed results. For reference, we also capture the reflection-free ground truth images with a long exposure time tlong = 10tshort by removing the glass. Comparisons of the reflection-removed grayscale unpolarized images8 are

8PRRPAW (Lei et al. 2020) takes polarized images as input and outputs unpolarized results in grayscale.

shown in Fig. 49. We can see that PRRPAW (Lei et al. 2020) cannot remove the reflection adequately in low-light conditions, and our method can improve its performance by providing the enhanced polarized images, Do P, and Ao P as its input. Our method outperforms IPLNet (Hu et al. 2020) both quantitatively and qualitatively. Shape from Polarization. We choose the state-of-the-art shape from polarization method DP3I (Deschaintre, Lin, and Ghosh 2021) for validation. We directly capture the polarized low-light images as the input of our method and IPLNet (Hu et al. 2020). Comparisons of the estimated normal maps are shown in Fig. 510 (the ground truth normal map is computed analytically according to the contour of the sphere). We can see that our method could improve the performance of DP3I (Deschaintre, Lin, and Ghosh 2021) by a large margin, while IPLNet (Hu et al. 2020) brings negative effects since it generates over-smooth Do P and Ao P, providing unreliable physical constraints for such an application.

We presented a learning-based solution to enhance multiple polarized low-light images for enhancing the accuracy of the Do P and Ao P. To make effective use of the information provided by polarization, we proposed a Stokes-domain enhancement pipeline along with a dual-branch neural network, handling the problem in a polarization-aware manner. We also demonstrated that our method can improve the performance of applications of polarization-based vision in low-light conditions, including reflection removal and shape from polarization.

9Additional results can be found in the supplementary material. 10Additional results can be found in the supplementary material.

Acknowledgements This work is supported by National Key R&D Program of China (2021ZD0109803), and National Natural Science Foundation of China under Grant No. 62136001, 62088102, 62276007, and 61876007.

References Ai, S.; and Kwon, J. 2020. Extreme low-light image enhancement for surveillance cameras using attention U-Net. Sensors, 20(2): 495. Atoum, Y.; Ye, M.; Ren, L.; Tai, Y.; and Liu, X. 2020. Colorwise attention network for low-light image enhancement. In Proc. of Computer Vision and Pattern Recognition Workshops. Chen, C.; Chen, Q.; Do, M. N.; and Koltun, V. 2019. Seeing motion in the dark. In Proc. of Computer Vision and Pattern Recognition. Chen, C.; Chen, Q.; Xu, J.; and Koltun, V. 2018. Learning to see in the dark. In Proc. of Computer Vision and Pattern Recognition. Deschaintre, V.; Lin, Y.; and Ghosh, A. 2021. Deep polarization imaging for 3D shape and SVBRDF acquisition. In Proc. of Computer Vision and Pattern Recognition. Fan, M.; Wang, W.; Yang, W.; and Liu, J. 2020. Integrating semantic segmentation and retinex model for low-light image enhancement. In Proc. of ACM MM. Guo, C.; Li, C.; Guo, J.; Loy, C. C.; Hou, J.; Kwong, S.; and Cong, R. 2020. Zero-reference deep curve estimation for low-light image enhancement. In Proc. of Computer Vision and Pattern Recognition. He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In Proc. of Computer Vision and Pattern Recognition. Hecht, E. 2012. Optics. Pearson Education India. Hinton, G. E.; and Salakhutdinov, R. R. 2006. Reducing the dimensionality of data with neural networks. Science, 313(5786): 504 507. Hu, H.; Lin, Y.; Li, X.; Qi, P.; and Liu, T. 2020. IPLNet: A neural network for intensity-polarization imaging in low light. Optics Letters, 45(22): 6162 6165. Huang, G.; Liu, Z.; Van Der Maaten, L.; and Weinberger, K. Q. 2017. Densely connected convolutional networks. In Proc. of Computer Vision and Pattern Recognition. Jiang, H.; and Zheng, Y. 2019. Learning to see moving objects in the dark. In Proc. of International Conference on Computer Vision. Jiang, Y.; Gong, X.; Liu, D.; Cheng, Y.; Fang, C.; Shen, X.; Yang, J.; Zhou, P.; and Wang, Z. 2021. Enlighten GAN: Deep light enhancement without paired supervision. IEEE Transactions on Image Processing, 30: 2340 2349. Kingma, D. P.; and Ba, J. 2014. ADAM: A method for stochastic optimization. ar Xiv:1412.6980. K onnen, G. 1985. Polarized light in nature. CUP Archive. Lamba, M.; and Mitra, K. 2021. Restoring extremely dark images in real time. In Proc. of Computer Vision and Pattern Recognition.

Land, E. H. 1977. The retinex theory of color vision. Scientific American, 237(6): 108 129. Lei, C.; Huang, X.; Zhang, M.; Yan, Q.; Sun, W.; and Chen, Q. 2020. Polarized reflection removal with perfect alignment in the wild. In Proc. of Computer Vision and Pattern Recognition. Li, C.; Guo, C.; and Chen, C. L. 2021. Learning to enhance low-light image via zero-reference deep curve estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence. Li, C.; Guo, J.; Porikli, F.; and Pang, Y. 2018. Lighten Net: A convolutional neural network for weakly illuminated image enhancement. Pattern Recognition Letters, 104: 15 22. Li, J.; Feng, X.; and Hua, Z. 2021. Low-light image enhancement via progressive-recursive network. IEEE Transactions on Circuits and Systems for Video Technology, 31(11): 4227 4240. Li, J.; Li, J.; Fang, F.; Li, F.; and Zhang, G. 2020. Luminance-aware pyramid network for low-light image enhancement. IEEE Transactions on Multimedia, 23: 3153 3165. Lim, S.; and Kim, W. 2020. DSLR: Deep stacked laplacian restorer for low-light image enhancement. IEEE Transactions on Multimedia, 23: 4272 4284. Liu, R.; Ma, L.; Zhang, J.; Fan, X.; and Luo, Z. 2021. Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In Proc. of Computer Vision and Pattern Recognition. Lore, K. G.; Akintayo, A.; and Sarkar, S. 2017. LLNet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognition, 61: 650 662. Lu, K.; and Zhang, L. 2020. TBEFN: A two-branch exposure-fusion network for low-light image enhancement. IEEE Transactions on Multimedia, 23: 4093 4105. Lv, F.; Li, Y.; and Lu, F. 2021. Attention guided low-light image enhancement with a large scale low-light simulation dataset. International Journal of Computer Vision, 129(7): 2175 2193. Lv, F.; Liu, B.; and Lu, F. 2020. Fast enhancement for nonuniform illumination images using light-weight CNNs. In Proc. of ACM MM. Lv, F.; Lu, F.; Wu, J.; and Lim, C. 2018. MBLLEN: Lowlight image/video enhancement using CNNs. In Proc. of British Machine Vision. Lyu, Y.; Cui, Z.; Li, S.; Pollefeys, M.; and Shi, B. 2019. Reflection separation using a pair of unpolarized and polarized images. In Proc. of Advances in Neural Information Processing Systems. Maharjan, P.; Li, L.; Li, Z.; Xu, N.; Ma, C.; and Li, Y. 2019. Improving extreme low-light image denoising via residual learning. In Proc. of International Conference on Multimedia and Expo. Oktay, O.; Schlemper, J.; Folgoc, L. L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; Mc Donagh, S.; Hammerla, N. Y.; Kainz, B.; Glocker, B.; and Rueckert, D. 2018. Attention U-Net: Learning where to look for the pancreas. ar Xiv:1804.03999.

Pizer, S. M.; Amburn, E. P.; Austin, J. D.; Cromartie, R.; Geselowitz, A.; Greer, T.; ter Haar Romeny, B.; Zimmerman, J. B.; and Zuiderveld, K. 1987. Adaptive histogram equalization and its variations. Computer vision, Graphics, and Image Processing, 39(3): 355 368. Ren, W.; Liu, S.; Ma, L.; Xu, Q.; Xu, X.; Cao, X.; Du, J.; and Yang, M.-H. 2019. Low-light image enhancement via a deep hybrid network. IEEE Transactions on Image Processing, 28(9): 4364 4375. Tibbs, A. B.; Daly, I. M.; Roberts, N. W.; and Bull, D. R. 2018. Denoising imaging polarimetry by adapted BM3D method. Journal of the Optical Society of America, 35(4): 690 701. Triantafyllidou, D.; Moran, S.; Mc Donagh, S.; Parisot, S.; and Slabaugh, G. 2020. Low light video enhancement using synthetic data produced with an intermediate domain mapping. In Proc. of European Conference on Computer Vision. Ulyanov, D.; Vedaldi, A.; and Lempitsky, V. 2016. Instance normalization: The missing ingredient for fast stylization. ar Xiv:1607.08022. Wang, L.-W.; Liu, Z.-S.; Siu, W.-C.; and Lun, D. P. 2020. Lightening network for low-light image enhancement. IEEE Transactions on Image Processing, 29: 7984 7996. Wang, R.; Xu, X.; Fu, C.-W.; Lu, J.; Yu, B.; and Jia, J. 2021. Seeing dynamic scene in the dark: A high-quality video dataset with mechatronic alignment. In Proc. of International Conference on Computer Vision. Wang, R.; Zhang, Q.; Fu, C.-W.; Shen, X.; Zheng, W.-S.; and Jia, J. 2019a. Underexposed photo enhancement using deep illumination estimation. In Proc. of Computer Vision and Pattern Recognition. Wang, X.; Girshick, R.; Gupta, A.; and He, K. 2018. Nonlocal neural networks. In Proc. of Computer Vision and Pattern Recognition. Wang, Y.; Cao, Y.; Zha, Z.-J.; Zhang, J.; Xiong, Z.; Zhang, W.; and Wu, F. 2019b. Progressive retinex: Mutually reinforced illumination-noise perception network for low-light image enhancement. In Proc. of ACM MM. Wei, C.; Wang, W.; Yang, W.; and Liu, J. 2018. Deep retinex decomposition for low-light enhancement. In Proc. of British Machine Vision. Wei, K.; Fu, Y.; Yang, J.; and Huang, H. 2020. A physicsbased noise formation model for extreme low-light raw denoising. In Proc. of Computer Vision and Pattern Recognition. Xu, K.; Yang, X.; Yin, B.; and Lau, R. W. 2020. Learning to restore low-light images via decomposition-andenhancement. In Proc. of Computer Vision and Pattern Recognition. Yang, W.; Wang, S.; Fang, Y.; Wang, Y.; and Liu, J. 2020. From fidelity to perceptual quality: A semi-supervised approach for low-light image enhancement. In Proc. of Computer Vision and Pattern Recognition. Yang, W.; Wang, S.; Fang, Y.; Wang, Y.; and Liu, J. 2021a. Band representation-based semi-supervised low-light image enhancement: Bridging the gap between signal fidelity and

perceptual quality. IEEE Transactions on Image Processing, 30: 3461 3473. Yang, W.; Wang, W.; Huang, H.; Wang, S.; and Liu, J. 2021b. Sparse gradient regularized deep retinex network for robust low-light image enhancement. IEEE Transactions on Image Processing, 30: 2072 2086. Yu, R.; Liu, W.; Zhang, Y.; Qu, Z.; Zhao, D.; and Zhang, B. 2018. Deep Exposure: Learning to expose photos with asynchronously reinforced adversarial learning. In Proc. of Advances in Neural Information Processing Systems. Zhang, F.; Li, Y.; You, S.; and Fu, Y. 2021a. Learning temporal consistency for low light video enhancement from single images. In Proc. of Computer Vision and Pattern Recognition. Zhang, L.; Zhang, L.; Liu, X.; Shen, Y.; Zhang, S.; and Zhao, S. 2019. Zero-shot restoration of back-lit images using deep internal learning. In Proc. of ACM MM. Zhang, X.; Zhou, X.; Lin, M.; and Sun, J. 2018. Shuffle Net: An extremely efficient convolutional neural network for mobile devices. In Proc. of Computer Vision and Pattern Recognition. Zhang, Y.; Guo, X.; Ma, J.; Liu, W.; and Zhang, J. 2021b. Beyond brightening low-light images. International Journal of Computer Vision, 129(4): 1013 1037. Zhang, Y.; Zhang, J.; and Guo, X. 2019. Kindling the darkness: A practical low-light image enhancer. In Proc. of ACM MM. Zhao, Z.; Xiong, B.; Wang, L.; Ou, Q.; Yu, L.; and Kuang, F. 2021. Retinex DIP: A unified deep framework for lowlight image enhancement. IEEE Transactions on Circuits and Systems for Video Technology, 32(3): 1076 1088. Zheng, C.; Shi, D.; and Shi, W. 2021. Adaptive unfolding total variation network for low-light image enhancement. In Proc. of International Conference on Computer Vision. Zhou, C.; Teng, M.; Han, Y.; Xu, C.; and Shi, B. 2021. Learning to dehaze with polarization. In Proc. of Advances in Neural Information Processing Systems. Zhu, A.; Zhang, L.; Shen, Y.; Ma, Y.; Zhao, S.; and Zhou, Y. 2020a. Zero-shot restoration of underexposed images via robust retinex decomposition. In Proc. of International Conference on Multimedia and Expo. Zhu, M.; Pan, P.; Chen, W.; and Yang, Y. 2020b. EEMEFN: Low-light image enhancement via edge-enhanced multiexposure fusion network. In Proc. of the AAAI Conference on Artificial Intelligence.