# realtime_image_demoiracuteeing_on_mobile_devices__6f2476dd.pdf

Published as a conference paper at ICLR 2023

REAL-TIME IMAGE DEMOIR EING ON MOBILE DEVICES

Yuxin Zhang1 , Mingbao Lin1, Xunchao Li1, Han Liu2, Guozhi Wang2, Fei Chao1, Shuai Ren2, Yafei Wen2, Xiaoxin Chen2, Rongrong Ji1,3,4

1MAC Lab, School of Informatics, Xiamen University, 2VIVO AI Lab, 3Institute of Artificial Intelligence, Xiamen University, 4Pengcheng Lab

Moir e patterns appear frequently when taking photos of digital screens, drastically degrading the image quality. Despite the advance of CNNs in image demoir eing, existing networks are with heavy design, causing redundant computation burden for mobile devices. In this paper, we launch the first study on accelerating demoir eing networks and propose a dynamic demoir eing acceleration method (DDA) towards a real-time deployment on mobile devices. Our stimulus stems from a simple-yet-universal fact that moir e patterns often unbalancedly distribute across an image. Consequently, excessive computation is wasted upon non-moir e areas. Therefore, we reallocate computation costs in proportion to the complexity of image patches. In order to achieve this aim, we measure the complexity of an image patch by designing a novel moir e prior that considers both colorfulness and frequency information of moir e patterns. Then, we restore image patches with higher-complexity using larger networks and the ones with lower-complexity are assigned with smaller networks to relieve the computation burden. At last, we train all networks in a parameter-shared supernet paradigm to avoid additional parameter burden. Extensive experiments on several benchmarks demonstrate the efficacy of our proposed DDA. In addition, the acceleration evaluated on the VIVO X80 Pro smartphone equipped with a chip of Snapdragon 8 Gen 1 shows that our method can drastically reduce the inference time, leading to a real-time image demoir eing on mobile devices. Source codes and models are released at https://github.com/zyxxmu/DDA.

1 INTRODUCTION

Moir e patterns (Sun et al., 2018; Yang et al., 2017b) describe an artifact of images that in particular appears in television and digital photography. In contemporary society, using mobile phones to take screen pictures has become one of the most productive ways to record information quickly. Nevertheless, moir e patterns occur frequently from the interference between the color filter array (CFA) of camera and high-frequency repetitive signal. The resulting stripes of different colors and frequencies on the captured photos drastically degrade the visual quality. Therefore, developing demoir eing algorithms has received long-time attention in the research community, and yet remains unsolved in particular when running algorithms on mobile devices.

Primitive studies on image demoir eing resort to traditional machine learning techniques such as low-rank and sparse matrix decomposition (Liu et al., 2015) and bandpass filters (Yang et al., 2017a). The rising of convolutional neural networks (CNNs) has vastly boosted the efficacy of image demoir eing (He et al., 2019; Zheng et al., 2020). However, the improved quantitative performance, such as PSNR (Peak Signal-to-Noise Ratio), comes at the increasing costs of energy and computation. For example, MBCNN (Zheng et al., 2020) eats up 4.22T floating-point operations (FLOPs) in order to restore a 1920 1080 smartphone-taken moir e image. Given that the moir e patterns mostly emerge in mobile photography, such massive computations carry considerable inference latency, preventing a real-time demoir eing experience from the users. Such a handicap could

Work done during Yuxin Zhang s internship at VIVO AI Lab Corresponding author: rrji@xmu.edu.cn

Published as a conference paper at ICLR 2023

Figure 1: Images with moir e patterns. The blue boxes show a blue zoomed in area. Two phenomena can be observed: (1) The moir e complexity varies significantly across different areas of an image. (2) Moir e patterns are mainly characterized by frequency and colorfulness.

be more serious when it comes to video demoir eing. Therefore, it is of great need to bridge the technology gap between academia and industry.

To tackle the aforementioned issue, we initiate the first study on accelerating demoir eing networks towards a real-time deployment on mobile devices. Our motivation arises from empirical actuality in moir e images. As shown in Fig. 1, an image is often partially contaminated by the moir e pattern. Some areas are filled with intensive moir e stripes, some are much relieved while some are kept away from moir e pollution. It is natural to hand out computation to these moir e centralized areas but less to these diluted areas. In the extreme case, it is needless to cleanse uninfluenced areas. Unfortunately, existing methods (Sun et al., 2018; Zheng et al., 2020) have not distinguished the treatment to the different areas in an image. They not only waste excessive computation on non-moir e areas but also bring about side effects, such as over-whitened image contents. Therefore, reallocating computation costs in compliance with the complexity of a moir e area can be a potential solution to accomplish real-time image demoir eing on mobile devices.

Stimulated by the above analysis, we opt to split a whole image into several sub-image patches. To measure the patch moir e complexity, we introduce a novel moir e prior. As can be referred in Fig. 1, moir e patterns are featured with either high frequency or rich color information. Thus, we define the moir e prior as the product of frequency and color information in a patch. In detail, we model the frequency information by a Gaussian filter and the colorfulness metric is a linear combination of the mean and standard deviation of the pixel cloud in the RGB colour space (Hasler & Suesstrunk, 2003). Using this prior to measure the moir e complexity, each image patch is then processed by a unique network with its computation costs in proportion to the moir e complexity. In this fashion, larger networks are utilized to restore moir e centralized areas to ensure the recovery quality while smaller networks are leveraged to restore moir e diluted areas to relieve computation burden. Thus, we have a better tradeoff between the image quality and resource requirements on mobile devices.

Nevertheless, multiple networks lead to more parameter burden, which also causes deployment pressure due to the short-supply memory on mobile devices. To mitigate this issue, we leverage the supernet paradigm (Yang et al., 2021) to jointly train all networks in a parameter-shared manner. Concretely, we regard the vanilla demoir eing network as a supernet, and weight-shared subnets of different sizes are directly extracted from this supernet to process image patches of different demoir e complexity. During the training phase, each subnet is dynamically trained using corresponding image patches with similar moir e complexity. Consequently, the overall running overhead can be effectively reduced without introduction of any additional parameters.

We have conducted extensive experiments for accelerating existing demoir eing networks on the LCDMoir e (Yuan et al., 2019) and FHDMi (He et al., 2020) benchmarks. The results show that our dynamic demoir eing acceleration method, termed DDA, achieves an obvious FLOPs reduction even with PSNR and SSIM increases. For instance, DDA reduces 45.2% FLOPs of the state-of-the-art demoir eing network MBCNN (Zheng et al., 2020) with 0.35 d B PNSR increase. Furthermore, the

Published as a conference paper at ICLR 2023

Moir e image

Clean image

Models with different

computation burden

Sub-image patches with varying moir e complexity

Restored sub-image

Figure 2: The framework of our proposed dynamic demoir eing acceleration method (DDA).

accelerated DMCNN (Sun et al., 2018) leads to real-time image demoir eing on the VIVO X80 Pro smartphone, with a tiny latency of 48.0ms when processing an image of 1920 1080 resolution.

This work addresses the problem of demoir eing network deployment on resource-limited mobile devices. The key contributions of this paper include: (1) A novel framework for accelerating image demoir eing networks in a dynamic manner. (2) An effective moir e prior to identify the demoir eing complexity of a given image patch. (3) Performance maintenance and apparent acceleration on modern smartphone devices.

2 RELATED WORK

2.1 IMAGE DEMOIR EING

Image demoir eing aims at removing moir e patterns on captured images. Early work mainly focuses on manually designed algorithms with the aid of low-rank & sparse matrix decomposition (Yang et al., 2017a; Liu et al., 2015). With the explosion and popularity of deep learning, extensive demoir eing networks are proposed in recent years to achieve moir e removal in an end-to-end manner (Sun et al., 2018; He et al., 2019). As a pioneering work, (Sun et al., 2018) proposed a multiscale network structure to remove moir e patterns at different frequencies and scales. (He et al., 2019) dived into designing specific learning schemes to resolve the unique properties of moir e patterns including frequency distributions, edge information and appearance attributes. (Zheng et al., 2020) reformulated the image demoir eing problem as moir e texture removal and color restoration, and proposed MBCNN (Zheng et al., 2020) which consists of a learnable bandpass filter to learn the frequency prior and a two-step tone mapping mechanism to restore color information. FHDe2Net (He et al., 2020) uses a global branch to eradicate multi-scale moir e patterns and a local branch to reserve fine details. (Liu et al., 2020) further proposed Wavle Net to handle demoir eing in the wavelet domain. The same dilemma for the aforementioned networks is their huge computation burden, which greatly prohibits the practical deployment on mobile devices.

2.2 DYNAMIC NETWORKS AND SUPERNETS

Dynamic networks adapt the network structures or parameters w.r.t. different inputs (Kong et al., 2021; Huang et al., 2017; Bolukbasi et al., 2017). Due to the advantages in accuracy performance and computation efficiency, dynamic networks have received increasing research interest in recent years. A comprehensive overview of dynamic networks can be found at (Han et al., 2021). Supernets are a type of dynamic network that reserves weight-shared sub-networks of multiple sizes within only one network, and randomly samples these sub-networks for training (Chen et al., 2022; Yang et al., 2021; Yu & Huang, 2019). According to the constraints of available resources, different subnetworks with varying widths and resolutions can be adaptively chosen during the testing phase without introducing additional parameter burden. Inspired by these studies, our proposed DDA involves the supernet paradigm by dynamically allocating image patches with different demoir eing complexity to their corresponding sub-networks.

Published as a conference paper at ICLR 2023

Moir e Complexity Score

Figure 3: The sorted sub-image patches w.r.t. moir e complexity scored by our proposed moir e prior.

3 METHODOLOGY

Fig. 2 manifests the general framework of our dynamic demoir eing acceleration method (DDA). A moir e image is firstly split into several sub-image patches, which are then reorganized into different groups in conformity with the patch moir e complexity. This is achieved by a novel moir e prior that considers both color and frequency information of moir e patterns. We detail it in Sec. 3.1. For the purpose of real-time image demoir eing on mobile devices, we train multiple networks of different complexity to process patches in different groups. In the training and testing phases, image patches with higher-complexity are fed to larger networks and patches with lower-complexity are dealt with smaller networks. Sec. 3.2 gives the implementations. Finally, as an alternative for training separate networks that result in more parameters, we regard each one as a subnet of the vanilla demoir eing network (supernet), leading to a weight-shared training paradigm as depicted in Sec. 3.3.

3.1 MOIR E PRIOR

The complexity degree of moir e patterns can be determined by a human, however, it is lavish and laborious to manually define the complexity for all patches in every moir e image. Many former works on dynamic networks (Han et al., 2021; Kong et al., 2021) train an additional module to adapt the network w.r.t. different inputs, which, however, brings unexpected parameters and computations for its compositions of several convolutional or fully-connected layers. Given our plan of deploying demoir eing networks on mobile devices, such a solution is not feasible, or at least not optimal.

We propose a novel moir e prior to measure the moir e complexity of an image in a fast manner. The motivation for this prior comes from an in-depth observation on moir e images. As can be inferred from Fig. 3, moir e patterns vary a lot in frequency and colorfulness. Customarily, a perceptible moir e pattern is highlighted by either high frequency or rich color information. Therefore, a prior reflecting both image frequency and colorfulness can be an efficacious method to model the intensity of moir e patterns. Denoting a moir e image as X, we first decompose it into sub-image patches as {xi}N i=1. For a specific patch x, we use the Gaussian high-pass filter (Dogra & Bhalla, 2014) with a standard deviation of 5 for the Gaussian distribution to extract the frequency information as F(x). To measure the patch colorfulness, we consider a linear combination of the mean and standard deviation of the pixel cloud in the RGB colour space (Hasler & Suesstrunk, 2003):

σ2(x R x G) + σ2(0.5(x R + x G) x B)

µ2(x R x G) + µ2(0.5(x R + x G) x B), (1)

where µ( ) and σ( ) are the mean and standard deviation functions, x R, x G, x B denote the R, G, B color channels, respectively. Here 0.3 is a parameter found by (Hasler & Suesstrunk, 2003) through maximizing the correlation between the experimental data and the colorfulness metric. Refer to (Hasler & Suesstrunk, 2003) for more principles of measuring image colorfulness. Therefore, our proposed moir e complexity score using frequency and colorfulness priors is finally defined as: M(x) = C(x) µ F(x) , (2) where µ( ) is the mean function. Fig. 3 shows that M(x) can be a reliable metric for evaluating the patch moir e complexity. Notice that, without building any extra network module, the operations of our moir e prior become highly cheap, bringing negligible computation burden.

3.2 DYNAMIC DEMOIR EING ACCELERATION

In image demoir eing, a moir e-polluted image X is expected to restore to moir e-free ground-truth in natural scenes. A traditional demoir eing process is formulated using CNNs as: Y = F(X; Θ), (3)

Published as a conference paper at ICLR 2023

where F standards for the demoir eing network with its parameters denoted as Θ. As can be seen from Eq. (3), existing methods restore all areas of a moir e image with the same network F, which wastes excessive computation since the moir e complexity varies significantly across different areas of an image as aforementioned in Sec. 1. This violates our goal of real-time image demoir eing on mobile devices. A natural way for demoir eing acceleration is to reallocate computation costs according to the complexity of a moir e area.

To that effect, we reorganize the sub-image patches {xi}N i=1 of X in an ascending order of their moir e complexity score defined in Eq. (2), and then split the ordered patches into M groups denoted as {G1, G2, ..., GM}. Each group Gi contains these image patches, moir e complexity scores of which range from the top- (i 1) N

M +1 to - i N

M smallest among all. Then, we construct M different demoir eing networks {Fi}M i=1 with parameters of different sizes as {Θi}M i=1, and process each image patch x Gi using the i-th network Fi as: y = Fi(x; Θi|x Gi). (4)

In our setting, the complexity of Fi is smaller than that of Fi+1 such that smaller-complex image patches can be handled by networks with low computation costs, and vice versa. Eq. (4) dynamically accelerates the derivation of Eq. (3) by assigning computation costs in line with the degree of moir e complexity. Meanwhile, the recovery quality is still ensured as moir e centralized areas are restored using larger networks. Finally, the moir e-free output of our dynamic demoir eing acceleration method (DDA) is obtained by concatenating patch outputs of all networks: Y = concat F1(x; Θ1|x G1), F2(x; Θ2|x G2), ..., FM(x; ΘM|x GM) , (5)

where concat() concatenates the output patches to construct a moir e-free full image.

3.3 SUPERNET TRAINING

Though the aforesaid procedure benefits reduction of the overall computation costs, the challenge arises in respect of parameter burden when deploying our demoir eing method on mobile devices featured with short-supply memories. For a simple case, setting the largest network FM as the vanilla demoir eing network F, additional parameters of PM 1 i=1 (|Θi|) are introduced in total.

To solve this, in place of training networks {Fi}M i=1 in isolation, we further propose to use the supernet paradigm (Yang et al., 2021) to train and infer all networks in a parameter-shared manner. In detail, the vanilla demoir eing network F is regarded as a supernet and its parameters Θ are partly shared by Fi. Supposing the network width of Fi is Wi, we inherit the first Wi proportion of convolution filter weights to the subnet Fi, denoted as Θ[Wi]. Consequently, the network Fi becomes a subnet of F. Therefore, our moir e-free output is finally reformulated as: Y = concat F(x; Θ[W1]|x G1), F(x2; Θ[W2]|x G2), ..., F(x; Θ[WM])|x GM . (6)

As a consequence, image demoir eing can be effectively accelerated even without any additional parameter burden, which finally reaches our target for a real-time deployment on mobile devices.

4 EXPERIMENT

4.1 EXPERIMENT SETUP

4.1.1 DATASETS

There are three main public datasets for image demoir eing. (1) LCDMoir e dataset (Yuan et al., 2019) from the AIM19 image demoir eing challenge consists of 10,200 synthetically generated image pairs including 10,000 training images, 100 validation images and 100 testing images at 1024 1024 resolution. (2) FHDMi dataset (He et al., 2020) contains 9,981 image pairs for training and 2,019 for testing with 1920 1080 resolution. (3) The TIP2018 dataset (Huang et al., 2017) consists of real photographs constructed by photographing images with 400 400 resolution from Image Net (Deng et al., 2009) displayed on computer screens. In this paper, we conduct experiments on the LCDMoir e and FHDMi datasets. We do not consider the TIP2018 benchmark since the resolution is too small to meet our target for image demoir eing on mobile devices from which the captured images generally have an extremely high resolution with 1920 1080 or higher.

Published as a conference paper at ICLR 2023

Table 1: Ablation study for applying different width list configurations in supernet training.

FHDMi AIM W PSNR SSIM Params PSNR SSIM Params {0.4, 0.5, 0.6} 22.81 0.8124 10.61M 41.68 0.9869 10.61M {0.25, 0.5, 0.75} 23.07 0.8766 11.88M 41.43 0.9852 11.88M {0.1, 0.5, 0.9} 21.03 0.7988 13.15M 40.21 0.9791 13.15M

4.1.2 EVALUATION PROTOCOLS

We adopt the widely-used metrics of PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structure Similarity) to conduct a quantitative comparison for demoir eing performance. We also report the color difference between restored images and the clean images using the CIE Delta E 2000 (Sharma et al., 2005) measurement, denoted as E. The Float Points Operations (denoted as FLOPs) and network latency per image on VIVO X80 Pro smartphone with the Snapdragon 8 Gen 1 chip of networks are reported as the accelerating evaluation.

4.1.3 BASELINES

We choose to accelerate DMCNN (Sun et al., 2018) and MBCNN (Zheng et al., 2020) to verify the efficacy of our DDA method. DMCNN is a pioneering network for image demoir eing with a multiscale structure. MBCNN is a state-of-the-art demoir eing network, which consists of a learnable bandpass filter to learn the frequency prior and a two-step tone mapping mechanism to restore color information. We report the results of baseline models as well as their compact versions by slimming the network width based on our re-implementation on the Py Torch framework (Paszke et al., 2019).

4.1.4 IMPLEMENT DETAILS

Our implementation of DDA is based on the Py Torch framework (Paszke et al., 2019), with the group number M = 3, width list W = {0.25, 0.5, 0.75} on FHDMi and W = {0.4, 0.5, 0.6} on LCDMoir e. We split the original images of LCDMoir e and FHDMi datasets into sub-image patches of 512 512 and 640 540, respectively. Then, we classify the sub-image patches into multiple groups with different moir e complexity using our proposed moir e prior. We train the supernet using Adam (Kingma & Ba, 2014) optimizer. The initial learning rate and batch size are set to 1e-4 and 4 in all experiments. During training, we iteratively extract a batch of image pairs within a specific class of moir e complexity, which are used to train the subnet of corresponding width extracted from the supernet. For DMCNN, we give 200 epochs for training with the learning rate divided by 10 at the 100-th epoch and 150-th epoch. For MBCNN, we follow (Zheng et al., 2020) to reduce the learning rate by half if the decrease in the validation loss is lower than 0.001 d B for four consecutive epochs and stop training once the learning rate becomes lower than 1e-6. All experiments are run on NVIDIA Tesla V100 GPUs.

4.2 PERFORMANCE ANALYSIS

In this section, we perform detailed performance analysis on the different components of our DDA including the supernet training paradigm and moir e prior.

4.2.1 SUPERNET TRAINING

We first conduct experiments to investigate how the hyper-parameters in the supernet training influence the performance of DDA, w.r.t, the width list configuration W. The experiments are conducted on the FHDMi (He et al., 2020) and LCDMoir e (Yuan et al., 2019) datasets using MBCNN. We can observe from Tab. 1 that a dispersed configuration W = {0.25, 0.5, 0.75} performs better on the FHDMi dataset, while a compact configuration W = {0.4, 0.5, 0.6} works better on the LCDMoir e dataset. To explain, LCDMoir e is a synthetic dataset, where the moir e patterns distribute more balanced compared with FHDMi that is captured using embedded cameras. Generally speaking, a more discrete width list guarantees our purpose of dynamically removing moir e patches of different complexity in real scenarios.

Published as a conference paper at ICLR 2023

Table 2: Ablation study for dynamic acceleration with/without supernet training.

Method PSNR Params w supernet 23.62 11.88M w/o supernet 23.15 30.16M

Table 3: Ablation study for restoring image patches from the same group with different widths.

Network G1 G2 G3 MBCNN-0.75 24.03 22.21 21.29 MBCNN-0.5 24.12 22.02 20.22 MBCNN-0.25 23.99 21.81 19.69

Table 4: Ablation study for the moir e prior.

Prior PSNR C 22.44 F 22.18 C+F 22.42 Ours 23.62

Table 5: Quantitative results for demoir eing acceleration on the FHDMi dataset. Method PSNR SSIM E FLOPs FLOPs Params Params Latency DMCNN 21.69 0.7731 6.67 699.16G 0.0% 1.43M 0.0% 69.4ms DMCNN-0.75 21.56 0.7704 6.81 476.62G 31.4% 0.99M 29.9% 55.2ms DMCNN-0.5 21.11 0.7691 7.14 314.82G 54.8% 0.68M 52.1% 47.9ms DMCNN-0.25 20.63 0.7655 7.98 208.73G 70.1% 0.48M 66.6% 41.8ms DDA 21.86 0.7708 6.55 333.39G 52.3% 0.99M 29.9% 48.0ms MBCNN 23.27 0.8201 5.38 4.22T 0.0% 14.21M 0.0% 259.8ms MBCNN-0.75 22.51 0.8113 6.11 3.05T 27.3% 11.88M 16.4% 192.4ms MBCNN-0.5 22.12 0.8077 6.32 2.07T 47.5% 9.92M 30.2% 147.2ms MBCNN-0.25 21.83 0.7991 6.54 1.28T 60.8% 8.36M 41.2% 119.7ms DDA 23.62 0.8293 5.21 2.13T 45.2% 11.88M 16.4% 147.1ms

Besides, Tab. 2 compares the performance for accelerating MBCNN on LCDMoir e between supernet training and respectively training each sub-network. It can be seen that supernet training does not lead to performance degradation and it drastically reduces the parameter burden compared with simultaneously keeping multiple sub-networks. The result well demonstrates the effectiveness of our DDA for practical deployment.

4.2.2 MOIR E PRIOR

We further analyze the performance of the proposed moir e prior. The experiments are conducted on the FHDMi dataset using MBCNN. We use networks of different widths to infer different groups of pictures with different complexity classified by our proposed moir e prior. The results for Tab. 3 show that all widths perform similarly for the easiest group, while larger width significantly outperforms smaller width for the group with highest moir e complexity. Such results demonstrate the effectiveness of our proposed moir e prior operator, and also validate our point that using large networks to restore patches with low moir e complexity wastes massive computation resources.

At last, we investigate three variants of our proposed moir e prior including (1) only using highfrequency information (denoted as F), (2) only using colorfulness information (denoted as C), (3) adding two information scores instead of multiplication in Eq. (2) (denoted as C+F). As shown in Tab. 4, all variants result in worse performance, which well demonstrates the efficacy of our proposed moir e prior that considers both color and frequency properties of moir e patterns. It is worth mentioning that C+F implies domination of the colorfulness measurement due to the fact that scores given by C are generally two orders of magnitude larger than those of F in our observation. As a result, C+F leads to a similar performance to C. In contrast, by multiplying both scores, our prior offers reliable moir e complexity for a given image patch.

4.3 QUANTITATIVE COMPARISON

Tab. 5 and Tab. 6 report the quantitative results of our DDA for accelerating DMCNN and MBCNN. On FHDMi, DDA surprisingly improves the PSNR of MBCNN by 0.35 d B even with a FLOPs reduction of 45.2%. We attribute such results to that the original baseline assigns the same network to restore the areas with very few or no moir e patterns, which may damage the original details of the image. Consequently, the poor performance barricades the usage of demoir eing networks in practical deployment. In contrast, DDA leverages the smallest network to restore these non-moir e areas, leading to a better global demoir eing effect. Besides, we demonstrate the effectiveness of DDA by comparing MBCNN accelerated by it with several state-of-the-art demoir eing networks including MDDM (Cheng et al., 2019), Mop Net (He et al., 2019), FHDe2Net (He et al., 2020) on FHDMi dataset. As can be seen from Tab. 7, DDA can outperform other networks regarding both complexity reduction and demoir eing performance. For instance, DDA surpasses FHDe2Net by 0.69

Published as a conference paper at ICLR 2023

Table 6: Quantitative results for demoir eing acceleration on the LCDMoir e dataset. Method PSNR SSIM E FLOPs FLOPs Params Params Latency DMCNN 34.58 0.9612 1.76 353.11G 0.0% 1.43M 0.0% 35.2ms DMCNN-0.75 33.41 0.9589 1.84 242.24G 31.4% 0.99M 29.9% 28.0ms DMCNN-0.5 32.99 0.9604 1.90 159.67G 54.8% 0.68M 52.1% 24.2ms DMCNN-0.25 31.75 0.9547 2.27 105.42G 70.1% 0.48M 66.6% 21.1ms DDA 34.19 0.9601 1.73 158.71G 55.1% 0.78M 45.4% 24.4ms MBCNN 43.95 0.9909 0.69 2.14T 0.0% 14.21M 0.0% 132.1ms MBCNN-0.75 41.67 0.9853 0.90 1.55T 27.3% 11.88M 16.4% 98.1ms MBCNN-0.5 41.31 0.9844 0.94 1.05T 47.5% 9.92M 30.2% 76.3ms MBCNN-0.25 40.78 0.9801 0.94 0.65T 60.8% 8.36M 41.2% 61.2ms DDA 41.68 0.9869 0.85 1.09T 46.9% 10.61M 25.4% 75.2ms

Table 7: Performance comparison between MBCNN accelerated by DDA and state-of-the-art demoir eing networks on the FHDMi dataset.

Method DMCNN MDDM Mop Net FHDe2Net MBCNN MBCNN-DDA PSNR 21.69 20.83 22.76 22.93 23.14 23.62 SSIM 0.7731 0.7343 0.7958 0.7885 0.8201 0.8293 FLOPs 0.41T 0.97T 6.26T 11.41T 4.22T 2.13T Params 2.37M 8.01M 12.40M 13.57M 14.21M 10.61M

Table 8: Performance comparison between MBCNN accelerated by DDA and state-of-the-art demoir eing networks on the LCDMoir e dataset.

Method DMCNN MDDM MDDM+ Mop Net MBCNN MBCNN-DDA PSNR 34.58 42.49 43.44 42.02 43.95 41.68 SSIM 0.9612 0.9940 0.9960 0.9872 0.9909 0.9869 FLOPs 476.62G 472.38G 440.44G 3.16T 2.14T 1.09T Params 2.37M 8.01M 6.55M 12.40M 14.21M 10.61M

d B PSNR with even far fewer FLOPs (2.13T for DDA and 11.41T for FHDe2Net), which shows the correctness and effectiveness of our perspective for reallocating computation costs in proportion to the moir e complexity of image patches.

As to LCDMoir e, compared with DMCNN-0.75 which simply infers the whole image, our DDA dynamically assigns computation resources with respect to moir e complexity of patches, retaining a better PSNR performance (34.19 d B for DDA and 33.41 d B for DMCNN-0.75) and more FLOPs reduction (55.1% for DDA and 31.4% for DMCNN-0.75). Meanwhile, DDA achieves a noticeable latency reduction of 56.9ms for accelerating MBCNN (75.2ms for DDA and 132.1ms for the baseline), enabling a real-time image demoir eing on mobile devices. Nevertheless, a noticeable performance drop of PSNR is still observed (41.68 d B for DDA and 43.95 d B for the full model), and comparison results with state-of-the-art networks including MDDM (Cheng et al., 2019), MDDM+ (Cheng et al., 2021) and Mop Net (He et al., 2019) in Tab. 8 also suggest a relatively poor result of DDA than its performance on FHDMi. Here we argue that the LCDMoir e dataset is built on simulating the aliasing between CFA and the screen s LCD subpixel, which results in images with different distributions of moir e patterns compared with smartphone-captured moir e images. Compared with smartphone-captured FHDMi with different moir e distributions, the moir e patterns in LCDMoir e are much more uniform and their cropped moir e patches are of similar complexity. This explains the relatively poor performance of DDA on the LCDMoir e dataset. Note that our approach even improves the performance of baseline models with less computation burden on FHDMi and achieves superior performance in comparison with SOTA networks. Given our motivation for the practical deployment of image demoir eing in real cases, the efficacy of the proposed method is still affirmative.

4.4 QUALITATIVE COMPARISON

In addition to the quantitative results, Fig. 4 further displays the visualization results of restored images on the FHDMi dataset. Results on the LCDemoir e dataset can be found in Appendix A.2. As can be observed, uniformly performing the same accelerating rate for the whole image (MBCNN-

Published as a conference paper at ICLR 2023

Moir e Image

MBCNN MBCNN-0.5x DDA Clear Image

Figure 4: Visual quality comparison for accelerating MBCNN on the FHDMi dataset. The red boxes show a zoomed in area for better observation.

0.5 ) drastically degrades the performance as the areas with dense moir e patterns do not receive enough computation resources to be efficiently restored. In contrast, by dynamically allocating the computational resources, our DDA achieves promising demoir eing quality compared with the original network. The efficacy of our proposed DDA for accelerating demoir eing networks for practical application is therefore well demonstrated.

5 LIMITATION AND FUTURE WORK

We further discuss the limitations of our DDA, which will be our future focus. Firstly, DDA simply divides all images into equal-number patches in each class, laying some avenues for future research in devising image-aware classification priors. Besides, our limited computing facilities prevent us from accelerating other demoir eing networks with varying structures (Liu et al., 2020; He et al., 2019). More validations are expected to be performed to further demonstrate the efficacy of DDA.

6 CONCLUSION

In this paper, we have presented a novel dynamic demoir eing acceleration method (DDA) to reduce the huge computational burden of existing networks towards real-time demoir eing on mobile devices. Our DDA is based on the observation that the moir e complexity is highly unbalanced across different areas of an image. On this basis, we propose to split the whole image into sub-patches, which are then regrouped according to their moir e complexities measured by a novel moir e prior that considers both the frequency and colorfulness information. Then, we use models with different sizes to restore patches in each group. In particular, larger networks are utilized to restore moir e centralized areas to ensure the recovery quality while smaller networks are leveraged to restore moir e diluted areas to relieve computation burden. To avoid the additional parameter burden caused by retaining multiple networks, we further leverage the supernet paradigm to jointly train the networks in a parameter-shared manner. Results on several benchmarks demonstrate that our method can effectively reduce the computation costs of existing networks with negligible performance degradation, enabling a real-time demoir eing on current smartphones.

Published as a conference paper at ICLR 2023

ACKNOWLEDGEMENT

This work is supported by the National Science Fund for Distinguished Young (No.62025603), the National Natural Science Foundation of China (No.62025603, No. U1705262, No. 62072386, No. 62072387, No. 62072389, No, 62002305, No.61772443, No. 61802324 and No. 61702136) and Guangdong Basic and Applied Basic Research Foundation (No.2019B1515120049).

Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. Adaptive neural networks for efficient inference. In International Conference on Machine Learning (ICML), pp. 527 536. PMLR, 2017.

Bohong Chen, Mingbao Lin, Kekai Sheng, Mengdan Zhang, Peixian Chen, Ke Li, Liujuan Cao, and Rongrong Ji. Arm: Any-time super-resolution method. In European Conference on Computer Vision (ECCV), 2022.

Xi Cheng, Zhenyong Fu, and Jian Yang. Multi-scale dynamic feature encoding network for image demoir eing. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 3486 3493. IEEE, 2019.

Xi Cheng, Zhenyong Fu, and Jian Yang. Improved multi-scale dynamic feature encoding network for image demoir eing. Pattern Recognition, 116:107970, 2021.

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248 255, 2009.

Ayush Dogra and Parvinder Bhalla. Image sharpening by gaussian and butterworth high pass filter. Biomedical and pharmacology journal, 7(2):707 713, 2014.

Yizeng Han, Gao Huang, Shiji Song, Le Yang, Honghui Wang, and Yulin Wang. Dynamic neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021.

David Hasler and Sabine E Suesstrunk. Measuring colorfulness in natural images. In Human vision and electronic imaging VIII, volume 5007, pp. 87 95. International Society for Optics and Photonics, 2003.

Bin He, Ce Wang, Boxin Shi, and Ling-Yu Duan. Mop moire patterns using mopnet. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2424 2432, 2019.

Bin He, Ce Wang, Boxin Shi, and Ling-Yu Duan. Fhde 2 net: Full high definition demoireing network. In European Conference on Computer Vision (ECCV), pp. 713 729. Springer, 2020.

Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens Van Der Maaten, and Kilian Q Weinberger. Multi-scale dense networks for resource efficient image classification. In Advances in Neural Information Processing Systems (Neur IPS), 2017.

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. ar Xiv preprint ar Xiv:1412.6980, 2014.

Xiangtao Kong, Hengyuan Zhao, Yu Qiao, and Chao Dong. Classsr: A general framework to accelerate super-resolution networks by data characteristic. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12016 12025, 2021.

Fanglei Liu, Jingyu Yang, and Huanjing Yue. Moir e pattern removal from texture images via lowrank and sparse matrix decomposition. In 2015 Visual Communications and Image Processing (VCIP), pp. 1 4. IEEE, 2015.

Lin Liu, Jianzhuang Liu, Shanxin Yuan, Gregory Slabaugh, Aleˇs Leonardis, Wengang Zhou, and Qi Tian. Wavelet-based dual-branch network for image demoir eing. In European Conference on Computer Vision (ECCV), pp. 86 102. Springer, 2020.

Published as a conference paper at ICLR 2023

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (Neur IPS), pp. 8026 8037, 2019.

Gaurav Sharma, Wencheng Wu, and Edul N Dalal. The ciede2000 color-difference formula: Implementation notes, supplementary test data, and mathematical observations. Color Research & Application: Endorsed by Inter-Society Color Council, The Colour Group (Great Britain), Canadian Society for Color, Color Science Association of Japan, Dutch Society for the Study of Color, The Swedish Colour Centre Foundation, Colour Society of Australia, Centre Franc ais de la Couleur, 30(1):21 30, 2005.

Yujing Sun, Yizhou Yu, and Wenping Wang. Moir e photo restoration using multiresolution convolutional neural networks. IEEE Transactions on Image Processing (TIP), 27(8):4160 4172, 2018.

Jingyu Yang, Fanglei Liu, Huanjing Yue, Xiaomei Fu, Chunping Hou, and Feng Wu. Textured image demoir eing via signal decomposition and guided filtering. IEEE Transactions on Image Processing (TIP), 26(7):3528 3541, 2017a.

Jingyu Yang, Xue Zhang, Changrui Cai, and Kun Li. Demoir eing for screen-shot images with multi-channel layer decomposition. In 2017 IEEE Visual Communications and Image Processing (VCIP), pp. 1 4. IEEE, 2017b.

Taojiannan Yang, Sijie Zhu, Matias Mendieta, Pu Wang, Ravikumar Balakrishnan, Minwoo Lee, Tao Han, Mubarak Shah, and Chen Chen. Mutualnet: Adaptive convnet via mutual learning from different model configurations. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021.

Jiahui Yu and Thomas S Huang. Universally slimmable networks and improved training techniques. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1803 1811, 2019.

Shanxin Yuan, Radu Timofte, Gregory Slabaugh, and Aleˇs Leonardis. Aim 2019 challenge on image demoireing: Dataset and study. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 3526 3533. IEEE, 2019.

Bolun Zheng, Shanxin Yuan, Gregory Slabaugh, and Ales Leonardis. Image demoireing with learnable bandpass filters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3636 3645, 2020.

Published as a conference paper at ICLR 2023

A.1 MORE VISUALIZATION RESULTS FOR THE MOIR E PRIOR

Moir e Complexity Score

Moir e Complexity Score

Moir e Complexity Score

Figure 5: The sorted sub-image patches cropped from the FHDMi dataset according to moir e complexity scores given by our proposed moir e prior.

A.2 QUALITATIVE RESULTS ON THE LCDMOIR E DATASET

Moir e Image MBCNN MBCNN-0.5x DDA Clear Image Zoomed In

Figure 6: Visual quality comparison for accelerating MBCNN on the LCDMoir e dataset. The red boxes show a zoomed in area for a better observation.

Published as a conference paper at ICLR 2023

Table 9: Training time comparison on the FHDMi and LCDMoir e datasets. We report NVIDIA Tesla V100 GPU days.

Dataset DMCNN DMCNN-DDA MBCNN MBCNN-DDA LCDMoir e 0.42 0.61 3.89 5.77 FHDMi 2.04 3.11 8.19 10.02

A.3 TRAINING TIME COMPARISON

In this section, we report the training time of MBCNN (Zheng et al., 2020), DMCNN (Sun et al., 2018) and their accelerated version by DDA. The results in Tab. 9 suggest heavier training consumption of DDA compared with the vanilla demoir eing networks. The additional training time stems from more training iterations per epoch in our supernet since the original datasets have been split into multiple patches. Nevertheless, we stress our goal in this paper is to perform a real-time deployment with its predominant advantage at the inference efficiency.