# conditional_diffusion_process_for_inverse_halftoning__ba9ef33e.pdf

Conditional Diffusion Process for Inverse Halftoning

Hao Jiang Peking University jianghao@stu.pku.edu.cn

Yadong Mu Peking University Peng Cheng Laboratory myd@pku.edu.cn

Inverse halftoning is a technique used to recover realistic images from ancient prints (e.g., photographs, newspapers, books). The rise of deep learning has led to the gradual incorporation of neural network designs into inverse halftoning methods. Most of existing inverse halftoning approaches adopt the U-net architecture, which uses an encoder to encode halftone prints, followed by a decoder for image reconstruction. However, the mainstream supervised learning paradigm with element-wise regression commonly adopted in U-net based methods has poor generalization ability in practical applications. Specifically, when there is a large gap between the dithering patterns of the training and testing halftones, the reconstructed continuous-tone images have obvious artifacts. This is an important issue in practical applications, since the algorithms for generating halftones are ever-evolving. Even for the same algorithm, different parameter choices will result in different halftone dithering patterns. In this paper, we propose the first generative halftoning method in the literature, which regards the black pixels in halftones as physically moving particles, and makes the randomly distributed particles move under some certain guidance through reverse diffusion process, so as to obtain desired halftone patterns. In particular, we propose a Conditional Diffusion model for image Halftoning (CDH), which consists of a halftone dithering process and an inverse halftoning process. By changing the initial state of the diffusion model, our method can generate visually plausible halftones with different dithering patterns under the condition of image gray level and Laplacian prior. To avoid introducing redundant patterns and undesired artifacts, we propose a meta-halftone guided network to incorporate blue noise guidance in the diffusion process. In this way, halftone images subject to more diverse distributions are fed into the inverse halftoning model, which helps the model to learn a more robust mapping from halftone distributions to continuous-tone distributions, thereby improving the generalization ability to unseen samples. Quantitative and qualitative experimental results demonstrate that the proposed method achieves state-of-the-art results.

1 Introduction

Halftoning refers to the task of simulating the brightness change of a continuous-tone image by changing the size or frequency of halftone dots (such as ink dots). In the last century, halftoning technology has been widely used in the printing industry to store precious image data through old newspapers, books, photographs, etc. At the same time, inverse halftoning technology emerged to recover stored continuous-tone image from vintage materials. The goal of the inverse halftoning technique is to minimize the loss of information in the restoration process, so that the restored print has the highest possible visual quality.

Corresponding author.

36th Conference on Neural Information Processing Systems (Neur IPS 2022).

Figure 1: Illustration of the presence of artifacts in restored images by previous inverse halftoning method (Xia and Wong, 2018), where the gap exists between the dithering patterns of training and testing halftones. The top row shows inverse halftoning results for some testing samples, and the bottom row demonstrates corresponding ground-truth continuous-tone images.

Traditional inverse halftoning methods are mainly based on image filtering (e.g., wavelet domain filtering (Xiong et al., 1999), edge-preserving filtering (Kite et al., 2000), SUSAN filtering (Siddiqui and Bouman, 2007)), bilateral filtering (Sun et al., 2014) and statistical learning methods (e.g., least mean square filtering (Chen and Hang, 1997), maximum a posteriori (Stevenson, 1997), look-up table (Mese and Vaidyanathan, 2001; Chung and Wu, 2005), dictionary learning (Zhang et al., 2018)). With the revival of deep learning technology, inverse halftoning methods based on deep neural networks have made great progress and attracted more and more attention (Hou and Qiu, 2017; Xiao et al., 2017; Kim and Park, 2018). The most representative method is the U-net (Ronneberger et al., 2015) based architecture, which has an encoder to learn the hidden encoding of the halftone printing, followed by a decoder to reconstruct the image (Xia and Wong, 2018; Gao et al., 2019).

However, the paradigm of supervised learning with element-wise regression commonly adopted in the U-net based methods suffers from poor generalization in practical applications. Specifically, when there is a large gap between the dithering patterns of training and testing halftones, the restored continuous-tone images often have obvious artifacts. This is an important issue in applications, since the algorithms used for generating halftones evolve with time. Even for the same algorithm, different parameter choices will result in different halftone dithering patterns. Taking frequency modulation (FM) halftoning as an example, we select 9 classical error diffusion processes with different dithering patterns, namely Floyd-Steinberg Dithering, Jarvis-Judice-Ninke Dithering, Stucki Dithering, Atkinson Dithering, Burkes Dithering, Sierra Dithering, and several of their variants (Lau and Arce, 2018). We train the U-net model on the halftones generated by 5 of the algorithms, and test on the halftones obtained with the rest 4 algorithms. The experimental results are shown in Figure 1. Artifacts can be clearly observed in the restored continuous-tone images.

To address this problem, in this paper, we propose a Conditional Diffusion model for image Halftoning (CDH), as shown in Figure 2, which consists of a halftone dithering process and an inverse halftoning process. We regard the black pixels in halftones as physically moving particles, and make the randomly distributed particles move under some certain guidance through the reverse diffusion process, so as to obtain the desired halftone distribution. Specifically, for the halftone dithering process, we train a conditional diffusion model to generate halftones with different dithering patterns under the condition of image gray level and Laplacian prior. By changing the initial state of the diffusion model, it can simulate different dithering processes to generate diverse halftone images. To avoid introducing redundant patterns and undesired artifacts during halftone generation, we propose a meta-halftone guided network to incorporate the blue noise guidance into the diffusion process. For inverse halftoning, we train an inverse halftoning diffusion model to learn the mapping function from

the halftone distribution to the continuous-tone distribution. In this way, halftones subject to more diverse distributions are input to the inverse halftoning model, which helps the model to learn a more robust mapping and improve the generalization ability to unseen samples.

Our contributions are summarized as follows:

This is the first work to propose a generative halftoning method, which regards the black pixels in halftones as physically moving particles, and makes the randomly distributed particles move under some certain guidance through the reverse diffusion process, so as to obtain the desired halftone dithering patterns. To avoid introducing redundant patterns and undesired artifacts during halftone generation, we propose a meta-halftone guided network to incorporate the blue noise guidance into the halftone diffusion process. To obtain better generalization ability, we use the x0 state of halftone dithering diffusion as the condition for inverse halftoning diffusion, so that the inverse halftoning diffusion model benefits from a wider range of dithering patterns and learns a more robust mapping.

We conduct experiments on the dataset consisting of 9 halftoning algorithms, and quantitative and qualitative experiments demonstrate that the proposed method achieves state-of-the-art results.

2 Related Work

Inverse Halftoning. Traditional inverse halftoning methods are mainly based on image filtering (e.g., edge-preserving filtering (Kite et al., 2000), wavelet domain filtering (Xiong et al., 1999), bilateral filtering (Sun et al., 2014), SUSAN filtering (Siddiqui and Bouman, 2007)) and statistical learning methods (e.g., look-up table (Chung and Wu, 2005; Mese and Vaidyanathan, 2001), least mean square filtering (Chen and Hang, 1997), maximum a posteriori (Stevenson, 1997), dictionary learning (Zhang et al., 2018)). For example, Sun et al. (Sun et al., 2014) proposed to use an anisotropic Gaussian filter and an edge-preserving filter for inverse halftoning. With the revival of deep learning technology, some researchers have tried to use deep neural networks to accomplish inverse halftoning, e.g., U-net based models (Hou and Qiu, 2017; Xiao et al., 2017; Gao et al., 2019), residual learning based models (Xia and Wong, 2018) and contextual learning based models (Kim and Park, 2018). For example, Xia et al. (Xia and Wong, 2018) proposed a progressively residual based U-net that synthesizes the global tone and subtle details to generate inverse halftones. However, these approaches suffer from unacceptable artifacts in the restored continuous-tone images when faced with dithering pattern gaps.

Diffusion Models. Diffusion models are a class of deep generative models developed from nonequilibrium thermodynamics (Sohl-Dickstein et al., 2015). They define a Markovian process for the diffusion steps, incrementally adding random noises to the original data, and then learn the reverse diffusion process to resample the data from noises (Sohl-Dickstein et al., 2015; Ho et al., 2020; Luo and Hu, 2021). Diffusion models are closely related to score-based generative models, which generate samples by Langevin dynamics based on estimated gradients of the data distribution (Song and Ermon, 2019, 2020; Song et al., 2020b, 2021). Some improved techniques are proposed to help the diffusion model converge to a lower negative log-likelihood or to speed up sampling (Nichol and Dhariwal, 2021; Dhariwal and Nichol, 2021; Song et al., 2020a). For example, Song et al. (Song et al., 2020a) generalizes denoising diffusion probabilistic models via a class of non-Markovian processes, which can correspond to deterministic generative processes and gives rise to implicit models that produce high quality samples much faster. However, these diffusion models cannot be directly applied to halftone generation, since they do not take into account the blue noise properties of halftone prints, making it difficult to generate visually pleasing dithering patterns.

3.1 Halftone Dithering Diffusion Conditioned on Gray-scale and Laplacian Prior

Traditional halftone dithering methods are mainly based on techniques of amplitude modulation (Blatner and Roth, 1993; Campbell et al., 1966) and frequency modulation (Eschbach, 1997; Floyd, 1976; Me se and Vaidyanathan, 1999). With the development of deep learning, some modern approaches try

Meta-halftone Guidance

Halftone Dithering Diffusion

Inverse Halftoning Diffusion

Figure 2: Illustration of the proposed Conditional Diffusion model for image Halftoning (CDH), which consists of a halftone dithering diffusion process and an inverse halftoning diffusion process.

to generate halftone images using convolutional neural networks (Kim and Park, 2018; Xia et al., 2021). However, these methods are limited to specific dithering patterns and cannot achieve flexible halftone generation. To address this problem, considering that the halftone prints are only composed of black and white pixels, we regard the black pixels in halftones as moving particles, and make these particles move from random Gaussian distributions to halftone distributions under reverse diffusion processes, so as to achieve a generative halftoning approach.

Given the distribution qs1(xs1 0 ) of halftone prints, the diffusion process qs1 is a Markovian noising process (Ho et al., 2020) that gradually adds noise to xs1 0 to obtain xs1 1:T , where s1 denotes the halftone dithering diffusion stage. Specifically, at each step t, the diffusion step adds the random Gaussian noise with a βt-controlled variance:

qs1(xs1 1:T |xs1 0 ) =

t=1 qs1(xs1 t |xs1 t 1), (1)

qs1(xs1 t |xs1 t 1) = N(xs1 t ; p

1 βtxs1 t 1, βt I), (2)

where βt (0, 1), t = 1, ..., T. With the reparameterization trick (Kingma and Welling, 2013), we can sample xs1 t from any time step t in a closed form: xs1 t = αtxs1 0 + 1 αtϵ, ϵ N(0, I), that is qs1(xs1 t |xs1 0 ) = N(xs1 t ; αtxs1 0 , (1 αt)I), (3)

where αt = 1 βt, αt = Qt i=1 αi. In this way, we can directly derive xs1 t by qs1(xs1 t |xs1 0 ) without repeatedly applying the Markovian process qs1 and calculating qs1(xs1 t |xs1 t 1).

In the halftone dithering scenario, we would like to obtain the halftone sample xs1 0 via the reverse diffusion process qs1(xs1 t 1|xs1 t ) from a random Gaussian distribution, i.e., the state xs1 T N(0, I). Considering that the particles tend to rely on the gradient information of images when forming halftone patterns, we use the image gray level r and the Laplacian prior l as conditions to guide the reverse diffusion process. Formally, the reverse halftoning diffusion process is defined on the distribution ps1 θ1(xs1 0:T |r, l), which is a Markov chain with start state ps1(xs1 T ) = N(xs1 T ; 0, I):

ps1 θ1(xs1 0:T |r, l) = ps1(xs1 T )

t=1 ps1 θ1(xs1 t 1|xs1 t , r, l), (4)

ps1 θ1(xs1 t 1|xs1 t , r, l) = N(xs1 t 1; µs1 θ1(xs1 t , r, l, t), Σs1 θ1(xs1 t , r, l, t)). (5)

Different from previous methods like using class labels (Dhariwal and Nichol, 2021) or shape latents (Luo and Hu, 2021) as conditions for the diffusion model, we use the pixel-wise image priors

Feature Extractor Hidden Feature Affine Learning

Meta-halftone Set Depth-wise Aggregation

Affine Factors

Spatial-wise Aggregation

Meta-halftone Guided Network

Figure 3: Illustration of the proposed Meta-halftone Guided Network.

as conditions. The conditions are performed in a simple way, i.e., we concatenate the state vector xs1 t at diffusion step t with r and l along the channel dimension to provide pixel-wise guidance:

xs1 t := xs1 t r l, (6)

where denotes the concatenate operation. Due to the properties of halftones, local dithering patterns are more important for generating high-quality halftones than global semantic information. We also observe in experiments that excessive introduction of global information can be harmful to dithering results.

We would like to learn the model ps1 θ1 to approximate the conditional probabilities qs1(xs1 t 1|xs1 t , xs1 0 ). With Bayes theorem, it has

qs1(xs1 t 1|xs1 t , xs1 0 ) = qs1(xs1 t |xs1 t 1, xs1 0 )qs1(xs1 t 1|xs1 0 ) qs1(xs1 t |xs1 0 ) , (7)

and qs1(xs1 t 1|xs1 t , xs1 0 ) can be represented as a Gaussian distribution:

qs1(xs1 t 1|xs1 t , xs1 0 ) = N(xs1 t 1; µs1 t (xs1 t , xs1 0 ), βt I), (8)

µs1 t (xs1 t , xs1 0 ) = αt 1βt

1 αt xs1 0 + αt(1 αt 1)

1 αt xs1 t , βt = 1 αt 1

1 αt βt. (9)

The simplified objective function (Ho et al., 2020) is used to train the model during the halftone dithering diffusion process:

Ls1 = Et [1,T ],xs1 0 qs1(xs1 0 ),ϵ N(0,I) ||ϵ ϵs1 θ1( αtxs1 0 +

1 αtϵ, t)||2 . (10)

3.2 Meta-halftone Guided Network

Blue noise properties are critical for generating high-quality halftones. It avoids noticeable lowfrequency visual artifacts in the resulting halftones by forcing random pixel dithering (Lau and Arce, 2018). Some traditional methods have been proposed to derive blue noise dithering patterns, such as simulated annealing based methods (Sullivan et al., 1991), void-and-cluster techniques (Ulichney, 1993), power spectrum manipulation algorithms (Yao and Parker, 1994), and dither pattern ordering methods (Lau et al., 1999). However, most of these approaches are based on statistical methods or manually designed blue-noise dithering matrices, which are difficult to directly apply to learningbased neural frameworks. Recently, some researchers have tried to inject blue noise properties into the L1 norm for learning objectives (Xia et al., 2021), however, since the calculation of the L1 norm requires known halftone generation results, it cannot be used in generative models. So far, how to

incorporate the blue-noise dithering properties into halftone generative models remains an unexplored problem.

To address this issue, we propose a meta-halftone guided network (as shown in Figure 3), which introduces blue noise guidance into halftone dithering diffusion process and avoids dithering results containing artifacts or redundant patterns. Formally, for step t of the halftone dithering diffusion process, we consider the state vector xs1 t and take it as input to the meta-halftone guided network. We first feed xs1 t into a feature extraction network E to obtain the extracted hidden feature f(xs1 t ), i.e., f(xs1 t ) = E(xs1 t ). We use the pre-trained VGG network (Simonyan and Zisserman, 2015) as E in the experiment, other networks such as Inception Net (Szegedy et al., 2016) or Res Net (He et al., 2016) are also available. The purpose of using pre-training is to save the consumption of computing resources and training time. Next, we define the meta-halftone set M, which consists of a set of meta-halftone vectors mi(1 i k):

M = [m1, m2, m3, ..., mk], (11)

where k represents the number of meta halftones. We obtain mi by the halftone dithering diffusion states of a set of low-frequency images. Specifically, we construct k images I1, I2, I3..., Ik with a large range of low-frequency regions (e.g., images with constant gray levels), and obtain the halftones H1, H2, H3, ..., Hk accordingly using conventional halftone algorithms (e.g., Floyd-Steinberg Algorithm (Lau and Arce, 2018)). We first train for a certain step using the traditional diffusion model alone, where the model uses U-net architecture common in (Ho et al., 2020; Dhariwal and Nichol, 2021) without adding the proposed meta-halftone guided network. We have

ps1 θ1(x Hi t 1|x Hi t , r Hi, l Hi) = N(x Hi t 1; µs1 θ1(x Hi t , r Hi, l Hi, t), Σs1 θ1(x Hi t , r Hi, l Hi, t)), (12)

where the superscript Hi denotes the constructed halftone sample from Hi {H1, H2, ..., Hk}, and r Hi and l Hi can be derived from Ii {I1, I2, ..., Ik} accordingly. The superscript s1 of x Hi t 1, x Hi t is omitted here for brevity. We can get ϵs1 θ1( αtx Hi 0 + 1 αtϵ, t) from model predictions and calculate µs1 θ1:

µs1 θ1(x Hi t , t) = 1 αt

x Hi t βt 1 αt ϵs1 θ1(x Hi t , t) . (13)

We use mi in Equation 11 to represent µs1 θ1 and construct meta-halftone set M accordingly:

mi = µs1 θ1( αtx Hi 0 +

1 αtϵ, t), 1 i k. (14)

Next, the model learns affine relationships between the hidden feature f(xs1 t ) and the meta-halftone set M through an affine learning layer, and then obtains affine factors g(xs1 t i):

g(xs1 t i) = wg i f(xs1 t ) + bg i, (15)

where wg i and bg i represent learnable weights and biases, respectively. With the calculated g(xs1 t i), we perform a depth-wise aggregation of the meta-halftone set M to learn the refined representations on depth channels:

i=1 mi g(xs1 t i). (16)

Besides depth information, spatial information is also crucial for meta-halftone guidance. Metahalftones can provide guidance for the generation of new dithering patterns in local areas. In light of this, we next perform a spatial-wise aggregation of the meta-halftone refined representation f m and the dithering diffusion state vector xs1 t , and denote the result as f m :

f m = X d f m xs1 t , (17)

where represents the element-wise concatenation in the spatial dimension, denotes the spatial convolution operation, and d represents the convolution kernel. The output o of the meta-halftone guided network is determined by the meta-halftone guidance f m and dithering diffusion state vector xs1 t : o = f m xs1 t , (18) and o is subsequently fed into U-net as in previous work (Dhariwal and Nichol, 2021; Nichol and Dhariwal, 2021) for model predictions.

3.3 Inverse Halftoning Diffusion Conditioned on xs1 0

The goal of inverse halftoning process is to learn a mapping from halftone distributions to continuoustone distributions and reduce the loss of information in the process. It is similar to halftone dithering diffusion process in Section 3.1, but with different diffusion conditions. For the inverse halftoning diffusion process, we deal with the process qs2(xs2 1:T |xs2 0 ) that gradually adds noise to the data xs2 0 from a continuous-tone distribution qs2(xs2 0 ), and the process qs2(xs2 t 1|xs2 t ) of gradually denoising from Gaussian noise to obtain desired samples.

We use a model ps2 θ2(xs2 t 1|xs2 t , h) to estimate qs2(xs2 t 1|xs2 t , xs2 0 ), which is conditioned on the halftone distribution h (i.e., sampling xs1 0 from halftone dithering diffusion model):

ps2 θ2(xs2 0:T |h) = ps2(xs2 T )

t=1 ps2 θ2(xs2 t 1|xs2 t , h), (19)

ps2 θ2(xs2 t 1|xs2 t , h) = N(xs2 t 1; µs2 θ2(xs2 t , h, t), Σs2 θ2(xs2 t , h, t)). (20)

Pixel-wise guidance between the state xs2 t and the halftone condition h is applied in the inverse halftoning diffusion process: xs2 t := xs2 t h. (21) where denotes the concatenate operation. The rest parts (such as the training objective) are similar to the halftone dithering diffusion process, and we omit them here to save space.

4 Experiments

4.1 Experimental Setup

Datasets. In order to evaluate the generalization ability of the proposed model CDH to different halftoning methods, we construct a dataset with relatively strong domain shifts. The domain shifts mainly arise from two aspects: the halftone algorithm and the image semantic. We construct the training set and validation set based on the UTKFace dataset (Zhang et al., 2017) and the test set based on the VOC2012 dataset (Everingham et al., 2010). We collect 9 different halftone dithering algorithms, namely Floyd-Steinberg Dithering, Jarvis-Judice-Ninke Dithering, Stucki Dithering, Atkinson Dithering, Burkes Dithering, Sierra Dithering, and several of their variants (Lau and Arce, 2018). We use halftones generated by 5 of the algorithms in training and validation sets, and halftones generated by remaining 4 algorithms are used as test sets. There are 7, 857 images in the training set, 400 images in the validation set, and 400 images in the test set.

Baselines. We choose the baselines that are commonly used in inverse halftoning tasks, which include the U-net based methods, generative adversarial network based methods, and super-resolution based methods: PRL (Xia and Wong, 2018) proposes an inverse halftoning network with progressively residual learning, which synthesizes the global tone and subtle details from halftone images in a progressive manner. Dhariwal et al. (Dhariwal and Nichol, 2021) show that diffusion models can achieve superior image sampling quality than existing generative models, and they use an improved U-net model architecture to achieve high-quality image synthesis. Nichol et al. (Nichol and Dhariwal, 2021) show that with some simple modifications, denoising diffusion probabilistic models can also achieve competitive log-likelihood while maintaining high sample quality. They learn variances of reverse diffusion process using reparameterizations and a hybrid learning objective that combines the variational lower-bound with the simplified objective from (Ho et al., 2020). In additon, they improve the noise schedule from a linear noise schedule to a cosine noise schedule. Song et al. (Song et al., 2020a) introduces denoising diffusion implicit models, giving rise to implicit models that produce high quality samples much faster. ESRGAN (Wang et al., 2018) is used as baselines in previous inverse halftoning work, which introduces the residual-in-residual dense block to generate realistic textures and avoid unpleasant artifacts. GLEAN (Chan et al., 2021) is applied to the superresolution task, using pre-trained generative adversarial networks to solve ill-posed problems in image restoration. We first smooth the halftone images with Gaussian kernels of different sizes, and then use GLEAN for image restoration. Real ESRGAN (Wang et al., 2021) is extended to a practical image restoration application, which introduces a high-order degradation modeling process to better simulate complex real-world degradations and employs a U-net discriminator with spectral normalization to increase discriminator capability.

Table 1: Model performance comparison of the proposed method CDH and baseline methods in terms of PSNR and SSIM.

Method Variants PSNR SSIM

PRL (Xia and Wong, 2018) 22.82 0.698 Dhariwal et al. (Dhariwal and Nichol, 2021) 23.35 0.693 Nichol et al. (Nichol and Dhariwal, 2021) Cosine Noise Schedule 22.40 0.683 Nichol et al. (Nichol and Dhariwal, 2021) Learn Sigma 22.45 0.702 Nichol et al. (Nichol and Dhariwal, 2021) Importance Sampled VLB 19.99 0.615 Song et al. (Song et al., 2020a) DDIM 18.20 0.571 ESRGAN (Wang et al., 2018) 20.20 0.428 ESRGAN (Wang et al., 2018) lkernel 22.47 0.645 GLEAN (Chan et al., 2021) 20.11 0.491 GLEAN (Chan et al., 2021) lkernel 21.74 0.607 Real ESRGAN (Wang et al., 2021) 21.04 0.626 Real ESRGAN (Wang et al., 2021) lkernel 20.94 0.619 CDH (Ours) 24.24 0.727

Evaluation Metrics. Following previous inverse halftoning work (Xia and Wong, 2018), we use two metrics to evaluate the performance of different models: Peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM). PSNR expresses the ratio between the power of peak signals and the power of noises, and SSIM evaluates the similarity between images in terms of luminance, contrast, and structure.

Implementation Details. The image size for halftone dithering and inverse halftoning is 256 256. The channel number of input halftones is 1. For halftone dithering diffusion model, the learning rate is set to 0.0001, and k is set to 20. We use 200 diffusion steps in training and testing phases. The number of model channels is 64 and the linear noise schedule is adopted throughout diffusion. We adopt Adam W (Loshchilov and Hutter, 2018) optimizer to train the halftone dithering diffusion model. For inverse halftoning diffusion model, we set the learning rate to 0.0001 and use 800 steps in the diffusion process. We set the number of model attention heads to 4. Adam W (Loshchilov and Hutter, 2018) is also used for optimization.

4.2 Experimental Results

We evaluate the performance of the proposed method CDH and baseline methods in terms of PSNR and SSIM, and the experimental results are shown in Table 1. We evaluate the effect of different Gaussian kernel sizes on baseline performance, where ESRGAN (lkernel), GLEAN (lkernel), Real ESRGAN (lkernel) denote the use of Gaussian kernels of size 7, and ESRGAN, GLEAN, Real ESRGAN represent the use of Gaussian kernels of size 3, respectively. The experimental results show that our method achieves the best performance on both metrics. Previous methods such as PRL (Xia and Wong, 2018), due to the lack of considering the generalization ability of the model on different halftoning algorithms, show sub-optimal performance on test data with relatively strong domain shifts. Compared with traditional diffusion models (Dhariwal and Nichol, 2021; Nichol and Dhariwal, 2021; Song et al., 2020a), the proposed method CDH achieves better results by taking into account the blue noise characteristics in the halftone dithering process, which enables the model to learn from a more realistic and diverse halftone distribution. Different Gaussian kernel sizes affect the performance of image restoration baselines, such as (Wang et al., 2018; Chan et al., 2021; Wang et al., 2021). A larger Gaussian kernel erases dot dithering patterns in halftones to a greater extent, resulting in restored images with less halftone artifacts, but also loss of high frequency information. Our method performs better than the image restoration baselines with different Gaussian kernel sizes. This shows that simply using traditional image restoration approaches is not suitable for the inverse halftoning task, since they do not take into account diverse pixel dithering patterns unique to halftone images.

To verify the effectiveness of the proposed meta-halftone guided network, we conduct experiments to observe the effect on generated halftone dithering qualities with and without the meta-halftone guided network. Experimental results are demonstrated in Figure 4. From the results shown in Figure 4, it can be observed that when the meta-halftone guided network is removed, the generated

w/o Meta-halftone

Guided Network

w/ Meta-halftone

Guided Network

Groundtruth

w/o Meta-halftone

Guided Network

w/ Meta-halftone

Guided Network

Groundtruth

Figure 4: Experimental results of the meta-halftone guided network. We show the generated halftone dithering results w/ and w/o the proposed meta-halftone guided network, as well as the halftone dithering groundtruth.

Table 2: Model performance comparison on halftones generated by different halftoning algorithms in terms of PSNR and SSIM.

Halftoning Algorithms PSNR SSIM

Floyd-Steinberg Dithering 24.46 0.735 Simple Floyd-Steinberg Dithering 24.01 0.692 Jarvis-Judice-Ninke Dithering 24.42 0.749 Stucki Dithering 24.53 0.749 Atkinson Dithering 23.08 0.710 Burkes Dithering 24.69 0.746 Sierra Dithering 24.49 0.750 Sierra Lite Dithering 24.40 0.733 Two row Sierra Dithering 24.54 0.741

halftones have more obvious dithering artifacts, resulting in poorer visual quality. When using the meta-halftone guided network, the visual quality of generated halftones is improved and more similar to the groundtruth. The experimental results demonstrate the effectiveness of the proposed meta-halftone guided network.

4.3 Study the Effect of Different Halftoning Algorithms

In order to explore the performance of CDH on different halftoning algorithms, we conduct experiments on halftones generated by 9 halftoning algorithms, namely Floyd-Steinberg Dithering, Jarvis-Judice-Ninke Dithering, Stucki Dithering, Atkinson Dithering, Burkes Dithering, Sierra Dithering, and several of their variants (Lau and Arce, 2018). The experimental results are shown in Table 2. From the experimental results we observe that the proposed method achieves similar results on different halftoning algorithms, which also verifies good generalization abilities of our method to different halftoning algorithms.

4.4 Qualitative Results

To further verify the effectiveness of the proposed method, we also conduct qualitative experiments to compare the performance of our methods and baselines, which are illustrated in Figure 5. From the qualitative experimental results, it can be observed that continuous-tone images generated by baseline methods still have some redundant artifacts, which are dot dithering patterns that are not completely removed during the inverse halftoning process. In contrast, our model removes the artifacts well without introducing redundant patterns, validating the effectiveness of the proposed method.

PRL Ours Groundtruth PRL Ours Groundtruth

Figure 5: Illustration of the qualitative experimental results. We show the inverse halftoning results generated by baseline methods and our method, as well as the groundtruth.

5 Conclusion

In this work, we propose a Conditional Diffusion model for image Halftoning (CDH), which consists of a halftone dithering process and an inverse halftoning process. We introduce a generative halftoning method, which regards the black pixels in halftones as physically moving particles, and makes the randomly distributed particles move under some certain guidance through the reverse diffusion process, so as to obtain the desired halftone dithering patterns. A meta-halftone guided network is introduced to avoid redundant patterns and undesired artifacts in generated halftones. By adopting more diverse halftones from halftone dithering diffusion process, we further improve the generalization ability of inverse halftoning model. We conduct experiments on the dataset consisting of 9 halftoning algorithms, and quantitative and qualitative experiments demonstrate the effectiveness of the proposed method.

Acknowledgement: The research is supported by Science and Technology Innovation 2030 - New Generation Artificial Intelligence (2020AAA0104401), Beijing Natural Science Foundation (Z190001), and Peng Cheng Laboratory Key Research Project No.PCL2021A07.

Broader Impact

A potential negative side effect of this work is that the sample generation model may be used to generate fake images in the halftoning or inverse halftoning process for some certain purpose. In addition, most of the images we used in the dataset are collected from the Internet and may contain some bias. The biases are preserved in the learning of the model and may be reflected in the generated image samples (e.g., induce the model to produce undesired results). An example of this work being used for unethical purposes is to first train the model with biased or discriminatory data and then induce the model to produce unfaithful results when inverse halftoning images. This may misinterpret the original meaning of some ancient prints and mislead people.

Limitation Analysis

A limitation of our model is that the diffusion process is relatively slow and thus requires more inference time than traditional inverse halftoning models, which is limited when the model is applied to mobile devices. One way to address this limitation is to design parallel inverse halftoning methods, such as restoring low-frequency and high-frequency components of images in parallel, thereby increasing the sampling speed.

David Blatner and Stephen F Roth. Real world scanning and halftones. Peachpit Press, 1993.

Fergus W Campbell, Janus J Kulikowski, and J Levinson. The effect of orientation on the visual resolution of gratings. The Journal of Physiology, 187(2):427 436, 1966.

Kelvin CK Chan, Xintao Wang, Xiangyu Xu, Jinwei Gu, and Chen Change Loy. Glean: Generative latent bank for large-factor image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 14245 14254, 2021.

Li-Ming Chen and Hsueh-Ming Hang. An adaptive inverse halftoning algorithm. IEEE Transactions on Image Processing, 6(8):1202 1209, 1997.

Kuo-Liang Chung and Shih-Tung Wu. Inverse halftoning algorithm using edge-based lookup table approach. IEEE Transactions on Image Processing, 14(10):1583 1589, 2005.

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. In Advances in Neural Information Processing Systems, pages 8780 8794, 2021.

Reiner Eschbach. Error diffusion algorithm with homogenous response in highlight and shadow areas. Journal of Electronic Imaging, 6(3):348 356, 1997.

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303 338, 2010.

Robert W Floyd. An adaptive algorithm for spatial gray-scale. In Proceedings of the Society of Information Display, volume 17, pages 75 77, 1976.

Qifan Gao, Xiao Shu, and Xiaolin Wu. Deep restoration of vintage photographs from scanned halftone prints. In Proceedings of the IEEE International Conference on Computer Vision, pages 4120 4129, 2019.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770 778, 2016.

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, pages 6840 6851, 2020.

Xianxu Hou and Guoping Qiu. Image companding and inverse halftoning using deep convolutional neural networks. ar Xiv preprint ar Xiv:1707.00116, 2017.

Tae-Hoon Kim and Sang Il Park. Deep context-aware descreening and rescreening of halftone images. ACM Transactions on Graphics, 37(4):1 12, 2018.

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. ar Xiv preprint ar Xiv:1312.6114, 2013.

Thomas D Kite, Niranjan Damera-Venkata, Brian L Evans, and Alan C Bovik. A fast, high-quality inverse halftoning algorithm for error diffused halftones. IEEE Transactions on Image Processing, 9(9):1583 1592, 2000.

Daniel L Lau and Gonzalo R Arce. Modern digital halftoning. CRC Press, 2018.

Daniel L Lau, Gonzalo R Arce, and Neal C Gallagher. Digital halftoning by means of green-noise masks. Journal of the Optical Society of America A, 16(7):1575 1586, 1999.

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.

Shitong Luo and Wei Hu. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2837 2845, 2021.

Murat Mese and Palghat P Vaidyanathan. Look-up table (lut) method for inverse halftoning. IEEE Transactions on Image Processing, 10(10):1566 1578, 2001.

Murat Me se and PP Vaidyanathan. Improved dot diffusion for image halftoning. In NIP & Digital Fabrication Conference, volume 1999, pages 350 353. Society for Imaging Science and Technology, 1999.

Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162 8171, 2021.

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234 241, 2015.

Hasib Siddiqui and Charles A Bouman. Training-based descreening. IEEE Transactions on Image Processing, 16(3):789 802, 2007.

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256 2265, 2015.

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2020a.

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, pages 11918 11930, 2019.

Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. In Advances in Neural Information Processing Systems, pages 12438 12448, 2020.

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2020b.

Yang Song, Conor Durkan, Iain Murray, and Stefano Ermon. Maximum likelihood training of score-based diffusion models. In Advances in Neural Information Processing Systems, pages 1415 1428, 2021.

Robert L Stevenson. Inverse halftoning via map estimation. IEEE Transactions on Image Processing, 6(4): 574 583, 1997.

James R Sullivan, Lawrence A Ray, and Rodney Miller. Design of minimum visual modulation halftone patterns. IEEE Transactions on Systems, Man, and Cybernetics, 21(1):33 38, 1991.

Bin Sun, Shutao Li, and Jun Sun. Scanned image descreening with image redundancy and adaptive filtering. IEEE Transactions on Image Processing, 23(8):3698 3710, 2014.

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2818 2826, 2016.

Robert A Ulichney. Void-and-cluster method for dither array generation. In Human Vision, Visual Processing, and Digital Display IV, volume 1913, pages 332 343. International Society for Optics and Photonics, 1993.

Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision Workshops, pages 63 79, 2018.

Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In International Conference on Computer Vision Workshops, pages 1905 1914, 2021.

Menghan Xia and Tien-Tsin Wong. Deep inverse halftoning via progressively residual learning. In Asian Conference on Computer Vision, pages 523 539, 2018.

Menghan Xia, Wenbo Hu, Xueting Liu, and Tien-Tsin Wong. Deep halftoning with reversible binary pattern. In Proceedings of the IEEE International Conference on Computer Vision, pages 14000 14009, 2021.

Yi Xiao, Chao Pan, Xianyi Zhu, Hai Jiang, and Yan Zheng. Deep neural inverse halftoning. In International Conference on Virtual Reality and Visualization, pages 213 218, 2017.

Zixiang Xiong, Michael T Orchard, and Kannan Ramchandran. Inverse halftoning using wavelets. IEEE Transactions on Image Processing, 8(10):1479 1483, 1999.

Meng Yao and Kevin J Parker. Modified approach to the construction of a blue noise mask. Journal of Electronic Imaging, 3(1):92 97, 1994.

Yan Zhang, Erhu Zhang, Wanjun Chen, Yajun Chen, and Jinghong Duan. Sparsity-based inverse halftoning via semi-coupled multi-dictionary learning and structural clustering. Engineering Applications of Artificial Intelligence, 72:43 53, 2018.

Zhifei Zhang, Yang Song, and Hairong Qi. Age progression/regression by conditional adversarial autoencoder. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5810 5818, 2017.