# learning_image_demoiréing_from_unpaired_real_data__21d2db0d.pdf

Learning Image Demoir eing from Unpaired Real Data

Yunshan Zhong1,2, Yuyao Zhou2,3, Yuxin Zhang2,3, Fei Chao2,3, Rongrong Ji1,2,3,4*

1Institute of Artificial Intelligence, Xiamen University. 2Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University. 3Department of Artificial Intelligence, School of Informatics, Xiamen University. 4Peng Cheng Laboratory. {zhongyunshan,yuyaozhou,yuxinzhang}@stu.xmu.edu.cn {fchao, rrji}@xmu.edu.cn

This paper focuses on addressing the issue of image demoir eing. Unlike the large volume of existing studies that rely on learning from paired real data, we attempt to learn a demoir eing model from unpaired real data, i.e., moir e images associated with irrelevant clean images. The proposed method, referred to as Unpaired Demoir eing (Un De M), synthesizes pseudo moir e images from unpaired datasets, generating pairs with clean images for training demoir eing models. To achieve this, we divide real moir e images into patches and group them in compliance with their moir e complexity. We introduce a novel moir e generation framework to synthesize moir e images with diverse moir e features, resembling real moir e patches, and details akin to real moir e-free images. Additionally, we introduce an adaptive denoise method to eliminate the low-quality pseudo moir e images that adversely impact the learning of demoir eing models. We conduct extensive experiments on the commonly-used FHDMi and UHDM datasets. Results manifest that our Un De M performs better than existing methods when using existing demoir eing models such as MBCNN and ESDNet-L. Code: https://github. com/zysxmu/Un De M.

Introduction

Contemporary society is awash with electronic screens for presenting images, text, video, etc. With the widespread availability of portable camera devices such as smartphones, people have grown accustomed to using them for quick information recording. Unfortunately, a common issue arises from the intrinsic interference between the camera s color filter array (CFA) and LCD subpixel layout of the screen (Yu et al. 2022), resulting in captured pictures being contaminated with some rainbow-shape stripes, which are also known as moir e patterns (Sun, Yu, and Wang 2018; Yang et al. 2017b). These moir e patterns involve varying thickness, frequencies, layouts, and colors, which degrade the perceptual quality of captured pictures. Consequently, there has been considerable academic and industrial interest in developing demoir eing algorithms to rectify the issue.

*Corresponding Author Copyright 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Figure 1: Illustration of image moir e. Natural moir e patterns are complex in varying thicknesses, frequencies, layouts, and colors across images and within an image.

Primitive research on demoir eing are mostly established upon image priors (Dabov et al. 2007; Cho et al. 2011) or traditional machine learning methods (Liu, Yang, and Yue 2015; Yang et al. 2017a), which are demonstrated to be inadequate for tackling moir e patterns of drastic variations (Zheng et al. 2021). Fortunately, the fashionable convolutional neural networks (CNNs) have become a de facto infrastructure for the success of various computer vision tasks including the image demoir eing (He et al. 2020; Cheng, Fu, and Yang 2019; He et al. 2019; Liu et al. 2020; Sun, Yu, and Wang 2018; Yuan et al. 2019; Zheng et al. 2021; Yu et al. 2022; Liu, Shu, and Wu 2018; Gao et al. 2019). These CNN-based methods are typically trained on extensive pairs of moir e-free and moir e images in a supervised manner to model the demoir eing mapping. However, it is challenging to collect paired images given the fact in Fig. 1 that natural moir e patterns are featured with varying thicknesses, frequencies, layouts, and colors (Zheng et al. 2021). We can easily access to the moir e images as well as moir e-free images, but they are mostly unpaired. Although many studies try to capture image pairs from digital screens (He et al. 2020; Yu et al. 2022), their quality is barricaded by three limitations. First, acquiring high-quality image pairs involves professional camera position adjustments and even special hardware (Yu et al. 2022). Second, burdensome manpower is required to select well-aligned moir e-free

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

and moir e pairs. Third, the captured moir e contents are very unitary under highly-controlled lab environments. However, image pairs full of more diverse moir e patterns are more expected for improving demoir eing models. Synthesizing moir e images has therefore attracted increasing attention recently. Given Fig. 2a illustrated moir efree screenshots, shooting simulation methods (Liu, Shu, and Wu 2018; Yuan et al. 2019; Niu, Guo, and Wang 2021) simulate the aliasing between CFA and screen s LCD subpixel to produce corresponding paired moir e images in Fig. 2b. However, the synthetic images are insufficient to capture characteristics of real moir e patterns, leading to a large domain gap as we analyze from two aspects. First, the synthetic moir e images are much darker and cannot well seize the light quality, destroying the context of the viewing environment and obscuring the image details. Second, the synthetic moir e patterns lack authenticity, as the thicknesses, frequencies, layouts, and colors of moir e stripes are almost the same within an image. In Table 1 and Table 2 of the experimental section, we apply shooting simulated moir e images to train demoir eing CNNs. Our results manifest that the trained models generalize poorly to the natural-world test datasets. In (Park et al. 2022), Park et al. introduced a cyclic moir e learning method and we observe a better performance than shooting simulation in Table 1 and Table 2. However, the generated pseudo moir e fails to accurately model real moir e patterns as illustrated in Fig. 2c, leading to limited performance. Therefore, it is desired to develop a better method to improve the synthetic moir e images. In this paper, we present one novel method, dubbed as Un De M, to learn demoir eing from unpaired real moir e and clean images that are fairly easy to collect, for example, by performing random screenshots and taking random photos from the digital screen. As displayed in Fig. 2d, the basic objective of our Un De M is to synthesize moir e images that possess mori e features as the real moir e images and details as the real moir e-free images. The synthesized pseudo moir e images then form pairs with the real moir efree images for training demoir eing networks. To this end, as shown in Fig. 3, we first split images into patches. These moir e patches are further grouped using a moir e prior that takes into consideration frequency and color information in each patch (Zhang et al. 2023). Consequently, moir e patches within each group fall into similar complexity such that they can be better processed by the individual moir e synthesis network. Specifically, the introduced synthesis network contains four modules including a moir e feature encoder to extract moir e features of real moir e patches, a generator to synthesize pseudo moir e patches, a discriminator to identify real or pseudo moir e patches, and a content encoder to retain content information of real clean patches in synthesized pseudo moir e patches. The whole framework is conducted in an adversarial training manner (Goodfellow et al. 2014) for a better moir e image generation. Before paired with real moir e-free images for training demoir eing networks, the synthesized moir e patches further undergo an adaptive denoise process to rule out these low-quality moir e patterns that bear image detail loss. Concretely, we find low-quality pseudo moir e leads to a large

structure difference from its moir e-free counterpart, which therefore can be removed if the difference score is beyond a threshold adaptive to a particular percentile of the overall structure differences. Experiments in Table 1 and Table 2 demonstrate that, the proposed Un De M improves the compared baseline by a large margin, on the real moir e image dataset. For example, when trained with a size of 384, MBCNN (Zheng et al. 2020) trained on the synthetic images from our Un De M achieves 19.89 d B in PSNR on FHDMi (He et al. 2020), while obtaining 19.36 d B from cyclic moir e learning (Park et al. 2022) and only 9.32 d B from shooting simulation. Such results not only demonstrate our efficacy, but also enlighten a new moir e generation method for the demoir eing community.

Related Work Image Demoir eing Image demoir eing target at cleansing moir e patterns in taken photos. Earlier studies resort to some property presumptions of moir e patterns, such as space-variant filters (Siddiqui, Boutin, and Bouman 2009; Sun, Li, and Sun 2014), a lowrank constrained sparse matrix decomposition (Liu, Yang, and Yue 2015; Yang et al. 2017a), and layer decomposition (Yang et al. 2017b). Along with the surge of deep learning on many computer vision tasks, demoir eing also benefits from the convolutional neural networks (CNNs) recently. As the pioneering study, Sun et al. (Sun, Yu, and Wang 2018) developed DMCNN, a multi-scale CNN, to remove moir e patterns at different frequencies and scales. He et al. (He et al. 2019) proposed Mop Net that is specially designed for unique properties of moir e patterns including frequencies, colors, and appearances. Zheng et al. (Zheng et al. 2020) introduced a multi-scale bandpass convolutional neural network (MBCNN) that consists of a learnable bandpass filter and a two-step tone mapping strategy to respectively deal with frequency prior and color shift. Liu et al. (Liu et al. 2020) designed WDNet that removes moir e patterns in the wavelet domain to effectively separate moir e patterns from image details. In (He et al. 2020), a multi-stage framework FHDe2Net is proposed. FHDe2Net employs a global to local cascaded removal branch to erase multi-scale moire patterns and a frequency-based branch to reserve fine details. Yu et al. (Yu et al. 2022) designed the ESDNet that utilizes the computationally-efficient semantic-aligned scale-aware module to enhance the network s capability. However, all these mentioned approaches require large amounts of moir e and moir e-free pairs. To solve this limitation, a cycle loss is further constructed to simultaneously train a pseudo moir e generator and a demoir eing network (Park et al. 2022; Yue et al. 2021). Very differently, our proposed Un De M does not involve demoire e network in the moir e synthesis stage.

Moir eing Dataset Since data-driven CNN-based algorithms require large amount of paired moir e and moir e-free images to complete the training, many efforts have been devoted to constructing large-scale image pairs. Sun et al. (Sun, Yu, and Wang 2018) built the first real-world moir e image dataset from

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

(a) Moir e-free images.

(b) Pseudo moir e images by shooting simulation.

(c) Pseudo moir e images by cyclic learning.

(d) Pseudo moir e images by our Un De M.

Figure 2: Visual examples of (a) real moir e-free images; (b) pseudo moir e images by shooting simulation (Niu, Guo, and Wang 2021); (c) pseudo moir e images by cyclic learning (Park et al. 2022); (d) pseudo moir e images by our Un De M. Compared with details-missing and inauthentic pseudo moir e images by shooting simulation and (Park et al. 2022), ours result in more diverse moir e patterns and preserve more details of the moir e-free images. Best view by zooming in.

Image Net (Russakovsky et al. 2015). He et al. (He et al. 2020) proposed the first high-resolution moir e image dataset FHDMi to satisfy the practical application in the real world. Yu et al. (Yu et al. 2022) further proposed the ultra-highdefinition demoir eing dataset UHDM containing 4K images. Nevertheless, the data preparation process requires huge human efforts, and the resulting datasets are confined to limited scenes. To avoid the drudgery of collecting real-world paired moir e and moir e-free images, shooting simulation that simulates the camera imaging process becomes a more valuable approach (Liu, Shu, and Wu 2018; Yuan et al. 2019). However, the synthetic data fails to model the real imaging process and leads to a large domain gap between synthetic data and real data. As a result, demoir eing models trained on synthetic data are incapable of handling real-world scenarios.

Methodology

Our Un De M contains image preprocessing, moir e synthesis network, and adaptive denoise, which are detailed one by one in the following.

Image Preprocessing

Moir e patterns vary significantly even within one single image. It is challenging for one single network to learn all cases. To better learn from these different moir e patterns, we apply an isolated moir e synthesis network to deal with moir e patterns with similar complexity. We first split images in moir e set Im into non-overlapping patches, leading to a moir e patch set Pm = {pm i }N i=1, where N is the number of patches for the whole moir e patch set. Similarly, we can have an M-size moir e-free patch set Pf = {pf i }M i=1 for If. As illustrated in Fig. 3, we divide the moir e set Pm into K subsets Pm = Pm 1 Pm 2 ... Pm K . Each Pm j contains moir e patches with similar complexity and any two subsets are disjoint. Zhang et al. (Zhang et al. 2023) showed that a perceptible moir e pattern is highlighted by either high frequency or rich color information. Following (Zhang et al. 2023), given a moir e patch pm P, whose frequency is measured by a Laplacian edge detection operator F(pm) with kernel size of 3 (Marr and Hildreth 1980). In addition, the colorfulness, denoted as C(pm), is the linear combination of the mean and standard deviation of the pixel cloud in the color plane of

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Figure 3: Image preprocessing. Both moir e images Im and unpaired moir e-free images If are split into patches. Patches from moir e images are further grouped in compliance with the complexity of moir e patterns.

RGB colour space (Hasler and Suesstrunk 2003): C(pm)

σ2(pm R pm G) + σ2 0.5(pm R + pm G) pm B

µ2(pm R pm G) + µ2 0.5(pm R + pm G) pm B ,

where σ( ) and µ( ) return standard deviation and mean value of inputs, the pm R , pm G, and pm B denote the red, green, and blue color channels of pm. We set K = 4 and obtain four evenly-sized subsets of moir e patches, each of which has distinctive moir e features. The first group Pm 1 contains patches with the first N/4 smallest F(pm) C(pm), thus it has moir e patterns of low frequency and less color. We sort the remaining patches from the smallest to the largest with a new metric F(pm)/C(pm). Then Pm 2 consists of the first N/4 patches highlighted by low frequency but rich color. The middle N/4 patches form Pm 3 featured with high frequency and rich color. The N/4 smallest scored patches with high frequency but less color make up with Pm 4 . Fig. 4 gives some visual examples.

Moir e Synthesis Network Fig. 5 depicts an overall framework of our moir e synthesis network Ti to learn moir e patterns from the group Pm i . It consists of a moir e feature encoder Em, a generator Gm, a discriminator Dm, and a content encoder Ec. Given an unpaired moir e patch pm Pm i and a moir e-free patch pf Pf, our motivion is to produce a pseudo moir e pm that possesses the moir e pattern of pm while still retaining image details of pf, such that ( pm, pf) forms moir e and moir e-free pairs to guide the learning of existing demoir eing networks. To fulfill this objective, the moir e feature encoder Em extracts the moir e features of the real moir e patch pm, denoted

Figure 4: An illustration of moir e images of each group. Each group has its own moir e patterns complexity.

as F m: F m = Em(pm). (2) Then, the generator Gm is to synthesize a pseudo moir e patch pm with F m and pf as its inputs:

pm = Gm Con(F m, pf) , (3)

where Con( , ) indicates the concatenation operation. The discriminator Dm cooperates with the generator Gm for a better pseudo moir e patch in an adversarial training manner (Goodfellow et al. 2014). The generator Gm is trained to trick the discriminator Dm by:

Ldis-G = Dm( pm) 1 2. (4)

The least squares loss function (Mao et al. 2017) is used for a better training stability. Also, Dm is trained to distinguish the pseudo moir e patch pm from the real pm:

Ldis-D = Dm( pm) 2 + Dm(pm) 1 2. (5)

The loss functions of Eq. (4) and Eq. (5) are optimized in a min-max game manner. As a result, Dm learns to distinguish the pseudo moir e and the real moir e images, while the moir e feature encoder Em is forced to learn to extract the moir e feature appropriately and the generator Gm learns to synthesize real-like and in-distribution pseudo moir e images, In addition, we also require moir e feature of synthesized pm to follow that of real pm by: F m = Em( pm), (6)

Lfea = F m F m 1, (7)

where 1 denotes the ℓ1 loss. To well pair pm and pf, pm is also expected to have contents details of pf. An additional content encoder Ec is introduced to align content features between pm and pf as:

Lcon = Ec( pm) Ec(pf) 1. (8)

Combining Eq. (4), Eq. (5), Eq. (7), and Eq. (8) leads to our final loss function as:

L = Ldis-G + Ldis-D + Lfea + Lcon. (9)

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

real moir𝐞 𝑝

moir𝐞 -free 𝑝 synthesized moir𝐞 patch 𝑝

Discriminator

Discriminator

Moire Feature Encoder

Content Encoder

Moire Feature Encoder

Figure 5: Framework of our moir e synthesis network.

(a) Moir e-free patch pf.

(b) Low-quality Pseudo moir e images pm.

Figure 6: Examples of low-quality pseudo moir e images.

Adaptive Denoise

After training our moir e synthesis networks {Ti}4 i=1, the pseudo moir e patch pm, paired with corresponding moir efree patch pf, sets the dataset to train demoir eing network. Unfortunately, we find some pseudo moir e patches occasionally suffer from low-quality issues. Some examples are manifested in Fig. 6, where the contents and details of pf are destroyed in pm. Such noisy data hinders the learning of demoir eing models. Fortunately, we observe in Fig. 6 that the ruined structure mostly attributes to the edge information. Therefore, we calculate the edge map of each patch by the Laplacian edge detection operator, and the structure difference is computed by summing up the absolute value of the edge difference between each pseudo pair. Low-quality pseudo moir e leads to a large score of structure difference, and we can rule out these pairs as long as the score is beyond a threshold that is adaptive to the γ-th percentile of structure differences in a total of N pseudo pairs. We conduct the above process for each synthesis network Ti and set a corresponding γi to remove low-quality pseudo moir e. We find N = 6, 400 performs well already. Consequently, we obtain a better performance. In summary, our Un De M consists of 1) training a moir e synthesis network for synthesizing pseudo moir e images; 2)

training a demoir eing model based on the trained moir e synthesis network. This paper focuses on addressing moir e image generation. As for demoir eing models, we directly borrow from existing studies. Details of training algorithms are listed in the supplementary materials.

Experimention Implementation Details Datasets. Public demoir eing datasets used in this paper include the FHDMi (He et al. 2020) dataset and UHDM (Yu et al. 2022) dataset. The FHDMi dataset consists of 9,981 image pairs for training and 2,019 image pairs for testing with 1920 1080 resolution. The UHDM dataset contains 5,000 image pairs with 4K resolution in total, of which 4,500 are used for training and 500 for testing. We use the training set to train the proposed moir e synthesis network. For image preprocessing, we crop the training images of FHDMi into 8 patches. For UHDM involving images with higher resolution, we crop the training images into 6 patches. During training, the moir e patch pm and moir e-free patch pf are selected from different original images (before image preprocessing) to ensure they are unpaired. Networks. We implement our Un De M using the Pytorch framework (Paszke et al. 2019). The architecture of moir e synthesis network is largely based on (Hu et al. 2019; Liu et al. 2021). Em and Ec contain one convolutional layer and two residual blocks. Gm contains three convolutional layers, nine residual blocks, and two deconvolutional layers, and ends with a convolutional layer to produce the final output. The residual blocks constitute two convolutional layers that are followed by instance normalization and Re LU function (Ulyanov, Vedaldi, and Lempitsky 2016). The convolutional layer has 16 channels for Em and Ec and 128 for Gm. Dm is borrowed from Patch GAN (Isola et al. 2017) and consists of three convolutional layers with a stride of 2, two convolutional layers with a stride of 1, and ends with an average pooling layer. For demoir eing models, we utilize MBCNN (Zheng et al. 2020) and ESDNet-L (Yu et al. 2022) (A large version of ESDNet). The moir e synthesis network is trained using the Adam optimizer (Kingma and Ba 2014), where the first momentum and second momentum are set to 0.9 and 0.999, respec-

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Model C.S. Method PSNR SSIM LPIPS

Paired 22.49 0.815 0.191 Shooting 10.66 0.477 0.570 Cyclic 19.15 0.722 0.257 Un De M 19.45 0.732 0.230

Paired 22.73 0.819 0.182 Shooting 9.32 0.513 0.572 Cyclic 19.36 0.733 0.265 Un De M 19.89 0.735 0.226

Paired 22.86 0.823 0.143 Shooting 10.06 0.558 0.487 Cyclic 19.09 0.738 0.241 Un De M 19.38 0.749 0.228

Paired 23.45 0.834 0.134 Shooting 9.81 0.553 0.512 Cyclic 19.05 0.715 0.273 Un De M 19.66 0.747 0.205

Table 1: Quantitative results on the FHDMi dataset. The C.S. denotes size in the random crop and the Paired denotes real paired data.

tively. We use 100 epochs for training with a batch size of 4 and an initial learning rate of 2 10 4, which is linearly decayed to 0 in the last 50 epochs. Besides, we perform different random crop sizes on the image patches after image preprocessing to validate the flexibility of our method for synthesizing pseudo moir e images. The crop sizes are set to 192 192 and 384 384 for FHDMi, and 192 192, 384 384, and 768 768 for UHDM, respectively. As for the demoir eing models, we retain the same training configurations as the original paper except that all models are trained for 150 epochs for a fair comparison. All networks are initialized using a Gaussian distribution with a mean of 0 and a standard deviation of 0.02. The γ1, γ2, γ3, and γ4 for adaptive denoise are empirically set to 50, 40, 30, and 20, respectively1. All experiments are run on NVIDIA A100 GPUs. Evaluation Protocols. We adopt the Peak Signal-to Noise Ratio (PSNR), Structure Similarity (SSIM) (Wang et al. 2004), and LPIPS (Zhang et al. 2018) to quantitatively evaluate the performance of demoir eing models.

Quantitative Results FHDMi. We first analyze the performance on the FHDMi dataset by comparing our Un De M against the baseline, i.e., the results of the shooting simulation (Niu, Guo, and Wang 2021) and cyclic learning (Park et al. 2022). Table 1 shows that the performance of demoir eing models trained on data produced by shooting simulation is extremely poor. For example, MBCNN obtains only 10.66 d B of PSNR when trained with 192 192 crop size, which indicates the existence of a large domain gap between the pseudo and real data. Both the cyclic learning method (Park et al. 2022) and our Un De M exhibit much better results. Moreover, compared with (Park et al. 2022), our Un De M successfully mod-

1Ablations on γi and each component in Un De M are provided in the supplementary materials.

Model C.S. Method PSNR SSIM LPIPS

Paired 20.14 0.760 0.346 Shooting 8.99 0.528 0.632 Cyclic 17.42 0.663 0.464 Un De M 17.96 0.673 0.425

Paired 20.14 0.759 0.356 Shooting 9.27 0.538 0.603 Cyclic 17.68 0.665 0.476 Un De M 17.78 0.668 0.401

Paired 21.41 0.793 0.332 Shooting 9.33 0.543 0.605 Cyclic 17.98 0.719 0.503 Un De M 18.13 0.723 0.360

Paired 21.30 0.786 0.258 Shooting 9.80 0.606 0.544 Cyclic 18.02 0.659 0.371 Un De M 18.30 0.662 0.365

Paired 21.18 0.785 0.257 Shooting 10.27 0.604 0.522 Cyclic 17.75 0.679 0.404 Un De M 18.18 0.688 0.361

Paired 22.12 0.799 0.245 Shooting 9.80 0.599 0.542 Cyclic 18.00 0.697 0.423 Un De M 18.40 0.713 0.344

Table 2: Quantitative results on the UHDM dataset. The C.S. denotes size in the random crop and the Paired denotes real paired data. The indicates results directly copied from (Yu et al. 2022).

els the moir e patterns and thus presents the highest performance. For instance, MBCNN respectively obtains 19.45 d B and 19.89 d B of PSNR when trained with crop sizes of 192 and 384. For ESDNet-L, the PSNR results are 19.38 d B and 19.66 d B, respectively. Correspondingly, the SSIM and LPIPS of our Un De M also exhibit much better performance than shooting simulation and cyclic learning. UHDM. The results on the UHDM dataset are provided in Table 2. Demoir eing models trained on shooting simulation still fail to deal with the real data, and cyclic learning provides better results. More importantly, our Un De M surpasses these two methods over different networks and training sizes. Specifically, the Un De M increases the PSNR by 0.54 d B, 0.10 d B, and 0.15 d B when training MBCNN with crop sizes of 192, 384, and 768, respectively. For ESDNet L, the PSNR gains are 0.28 d B, 0.43 d B, and 0.40, respectively. To summarize from Table 1 and Table 2, we can conclude that the transferability of our produced moir e images to the downstream demoir eing tasks and the efficacy of our Un De M over existing methods have therefore been well demonstrated.

Qualitative Results

Qualitative comparisons of demoir eing images on UHDM dataset are presented in Fig. 7, with additional results provided in the supplementary materials. As shown in Figure

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

(a) Moir e images.

(b) Demoir eing results by shooting simulation (Niu, Guo, and Wang 2021).

(c) Demoir eing results by cyclic learning (Park et al. 2022).

(d) Demoir eing results by our Un De M.

(e) Moir e-free images.

Figure 7: Visualization of demoir eing results of MBCNN (Crop Size: 768) on the UHDM dataset. For the convenience of demonstration, we crop the patch from the test image.

7b, the demoir eing results of shooting simulation exhibit unnaturally high brightness, leading to a loss of image detail. This decrease in visual quality can be blamed on the generally darker brightness of shooting simulation as shown in Fig. 2b, which makes the demoir eing model learning incorrect brightness relationship between the moir e and moir efree images. As presented in Fig. 7c, the demoir eing model fails to remove moir e due to cyclic learning cannot model the moir e patterns as illustrated in Fig. 2c. Results in Fig. 7d demonstrate the efficacy of Un De M in removing moir e patterns, reflecting the fact that Un De M has the ability to successfully model the moir e patterns.

In this paper, we present Un De M that performs real image demoir eing using unpaired real data in a learning-based

manner. We synthesize pseudo moir e images to form paired data for training off-the-shelf demoir eing models. The proposed Un De M contains three steps including image preprocessing, a moir e generation network, and adaptive denoise. The image preprocessing crops the real moir e images into multiple sub-image patches and groups them into four groups according to the moir e patterns complexity. A moir e generation network is applied to synthesize a pseudo moir e image that has the moir e feature as its input real moir e image and the image detail as its input moir e-free image. The adaptive denoise is introduced to rule out the low-quality synthetic moir e images for avoiding their adverse effects on the learning of demoir eing models. Un De M is demonstrated to improve the quality of synthetic images and the demoir eing models trained on these images are experimentally shown to be superior in performance.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Acknowledgments

This work was supported by National Key R&D Program of China (No.2022ZD0118202), the National Science Fund for Distinguished Young Scholars (No.62025603), the National Natural Science Foundation of China (No. U21B2037, No. U22B2051, No. 62176222, No. 62176223, No. 62176226, No. 62072386, No. 62072387, No. 62072389, No. 62002305 and No. 62272401), and the Natural Science Foundation of Fujian Province of China (No.2021J01002, No.2022J06001).

Cheng, X.; Fu, Z.; and Yang, J. 2019. Multi-scale dynamic feature encoding network for image demoir eing. In Proceedings of IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 3486 3493. IEEE. Cho, T. S.; Zitnick, C. L.; Joshi, N.; Kang, S. B.; Szeliski, R.; and Freeman, W. T. 2011. Image restoration by matching gradient distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34: 683 694. Dabov, K.; Foi, A.; Katkovnik, V.; and Egiazarian, K. 2007. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Transactions on Image Processing (TIP), 16: 2080 2095. Gao, T.; Guo, Y.; Zheng, X.; Wang, Q.; and Luo, X. 2019. Moir e pattern removal with multi-scale feature enhancing network. In Proceedings of the IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 240 245. Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; and Bengio, Y. 2014. Generative Adversarial Nets. In Proceedings of the Advances in Neural Information Processing Systems (Neur IPS), 2672 2680. Hasler, D.; and Suesstrunk, S. E. 2003. Measuring colorfulness in natural images. In Human vision and electronic imaging VIII, volume 5007, 87 95. He, B.; Wang, C.; Shi, B.; and Duan, L.-Y. 2019. Mop moire patterns using mopnet. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2424 2432. He, B.; Wang, C.; Shi, B.; and Duan, L.-Y. 2020. FHDe2Net: Full high definition demoireing network. In Proceedings of the European Conference on Computer Vision (ECCV), 713 729. Hu, X.; Jiang, Y.; Fu, C.-W.; and Heng, P.-A. 2019. Mask Shadow GAN: Learning to remove shadows from unpaired data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2472 2481. Isola, P.; Zhu, J.-Y.; Zhou, T.; and Efros, A. A. 2017. Imageto-image translation with conditional adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1125 1134. Kingma, D. P.; and Ba, J. 2014. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations (ICLR).

Liu, B.; Shu, X.; and Wu, X. 2018. Demoir eing of Camera Captured Screen Images Using Deep Convolutional Neural Network. ar Xiv preprint ar Xiv:1804.03809. Liu, F.; Yang, J.; and Yue, H. 2015. Moir e pattern removal from texture images via low-rank and sparse matrix decomposition. In IEEE Visual Communications and Image Processing (VCIP), 1 4. Liu, L.; Liu, J.; Yuan, S.; Slabaugh, G.; Leonardis, A.; Zhou, W.; and Tian, Q. 2020. Wavelet-based dual-branch network for image demoir eing. In Proceedings of the European Conference on Computer Vision (ECCV), 86 102. Liu, Z.; Yin, H.; Wu, X.; Wu, Z.; Mi, Y.; and Wang, S. 2021. From shadow generation to shadow removal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4927 4936. Mao, X.; Li, Q.; Xie, H.; Lau, R. Y.; Wang, Z.; and Paul Smolley, S. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2794 2802. Marr, D.; and Hildreth, E. 1980. Theory of edge detection. Proceedings of the Royal Society of London. Series B. Biological Sciences, 207: 187 217. Niu, D.; Guo, R.; and Wang, Y. 2021. Mori e Attack (MA): A New Potential Risk of Screen Photos. In Proceedings of the Advances in Neural Information Processing Systems (Neur IPS), 26117 26129. Park, H.; Vien, A. G.; Kim, H.; Koh, Y. J.; and Lee, C. 2022. Unpaired screen-shot image demoir eing with cyclic moir e learning. IEEE Access, 10: 16254 16268. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. 2019. Py Torch: An Imperative Style, High Performance Deep Learning Library. In Proceedings of the Advances in Neural Information Processing Systems (Neur IPS), 8026 8037. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. 2015. Image Net Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115: 211 252. Siddiqui, H.; Boutin, M.; and Bouman, C. A. 2009. Hardware-friendly descreening. IEEE Transactions on Image Processing (TIP), 19: 746 757. Sun, B.; Li, S.; and Sun, J. 2014. Scanned image descreening with image redundancy and adaptive filtering. IEEE Transactions on Image Processing (TIP), 23: 3698 3710. Sun, Y.; Yu, Y.; and Wang, W. 2018. Moir e photo restoration using multiresolution convolutional neural networks. IEEE Transactions on Image Processing (TIP), 27: 4160 4172. Ulyanov, D.; Vedaldi, A.; and Lempitsky, V. 2016. Instance normalization: The missing ingredient for fast stylization. ar Xiv preprint ar Xiv:1607.08022. Wang, Z.; Bovik, A. C.; Sheikh, H. R.; and Simoncelli, E. P. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing (TIP), 13: 600 612.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Yang, J.; Liu, F.; Yue, H.; Fu, X.; Hou, C.; and Wu, F. 2017a. Textured image demoir eing via signal decomposition and guided filtering. IEEE Transactions on Image Processing (TIP), 26: 3528 3541. Yang, J.; Zhang, X.; Cai, C.; and Li, K. 2017b. Demoir eing for screen-shot images with multi-channel layer decomposition. In IEEE Visual Communications and Image Processing (VCIP), 1 4. Yu, X.; Dai, P.; Li, W.; Ma, L.; Shen, J.; Li, J.; and Qi, X. 2022. Towards efficient and scale-robust ultra-highdefinition image demoir eing. In Proceedings of the European Conference on Computer Vision (ECCV), 646 662. Yuan, S.; Timofte, R.; Slabaugh, G.; Leonardis, A.; Zheng, B.; Ye, X.; Tian, X.; Chen, Y.; Cheng, X.; Fu, Z.; et al. 2019. Aim 2019 challenge on image demoireing: Methods and results. In Proceedings of IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 3534 3545. Yue, H.; Cheng, Y.; Liu, F.; and Yang, J. 2021. Unsupervised moir e pattern removal for recaptured screen images. Neurocomputing, 456: 352 363. Zhang, R.; Isola, P.; Efros, A. A.; Shechtman, E.; and Wang, O. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 586 595. Zhang, Y.; Lin, M.; Li, X.; Liu, H.; Wang, G.; Chao, F.; Ren, S.; Wen, Y.; Chen, X.; and Ji, R. 2023. Real-Time Image Demoireing on Mobile Devices. In Proceedings of the International Conference on Learning Representations (ICLR). Zheng, B.; Yuan, S.; Slabaugh, G.; and Leonardis, A. 2020. Image demoireing with learnable bandpass filters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3636 3645. Zheng, B.; Yuan, S.; Yan, C.; Tian, X.; Zhang, J.; Sun, Y.; Liu, L.; Leonardis, A.; and Slabaugh, G. 2021. Learning frequency domain priors for image demoireing. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44: 7705 7717.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)