# fast_compressive_phase_retrieval_under_bounded_noise__eb34df36.pdf Fast Compressive Phase Retrieval under Bounded Noise Hongyang Zhang, 1 Shan You, 2,3 Zhouchen Lin, 2,3 Chao Xu2,3 1Machine Learning Department, Carnegie Mellon University, U.S.A. 2Key Lab. of Machine Perception (MOE), School of EECS, Peking University, P. R. China 3 Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, P. R. China hongyanz@cs.cmu.edu, youshan@pku.edu.cn, zlin@pku.edu.cn, xuchao@cis.pku.edu.cn We study the problem of recovering a t-sparse vector x0 in Rn from m quadratic equations yi = (a T i x)2 with noisy measurements yi s. This is known as the problem of compressive phase retrieval, and has been widely applied to Xray diffraction imaging, microscopy, quantum mechanics, etc. The challenge is to design a a) fast and b) noise-tolerant algorithms with c) near-optimal sample complexity. Prior work in this direction typically achieved one or two of these goals, but none of them enjoyed the three performance guarantees simultaneously. In this work, with a particular set of sensing vectors ai s, we give a provable algorithm that is robust to any bounded yet unstructured deterministic noise. Our algorithm requires roughly O(t) measurements and runs in O(tn log(1/ϵ)) time, where ϵ is the error. This result advances the state-of-the-art work, and guarantees the applicability of our method to large datasets. Experiments on synthetic and real data verify our theory. Introduction Phase retrieval is a new research topic in machine learning, signal processing, and statistics. In machine learning and signal processing, the goal of phase retrieval is to recover a hidden signal +x or x from a quadratic system {yi = (a T i x)2}m i=1 with known sensing vectors {ai}m i=1 and measurements {yi}m i=1. In statistics, the x indicates a set of parameters remaining estimated for some probability distribution. The problem becomes more challenging when the signal, or the unknown parameter vector x, is t-sparse and only m = O(t)1 equations are available. This is known as the problem of compressive phase retrieval, and has been widely applied to X-ray diffraction imaging (Schniter and Rangan 2015), microscopy (Miao et al. 2008), quantum mechanics (Corbett 2006), and many other domains. See Figure 1 for an application of compressive phase retrieval in image sensing and restoration. Despite a large amount of work on compressive phase retrieval, many fundamental problems remain unresolved. Equal contributions. Corresponding author. Copyright c 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 1We denote by O( ) the simplicity of O( ) that omits the logarithm factor. (a) real scene (b) recovered image Figure 1: Recovery results of our algorithm in the framework of compressive phase retrieval, where the high-resolution real image contains a crescent Uranus taken by Voyager. One long-standing challenge is designing fast algorithms with near-optimal sample complexity. If there is no speed requirement, the recovery problem can be solved easily by dimension lifting approach, whose basic idea is to convert the problem of recovering a sparse signal x Rn to the problem of recovering a rank-one matrix xx T Rn n by implementing nuclear norm minimization (Cand es, Strohmer, and Voroninski 2013) or matrix completion (Cand es, Li, and Soltanolkotabi 2015; Zhang, Lin, and Zhang 2016; Zhang et al. 2015b). While this method enjoys intriguing performance guarantee with near-optimal sample complexity, the computational cost is large (Bahmani and Romberg 2015). This definitely limits its applicability to large datasets. When the measurements are noisy, the problem becomes even more challenging. Prior work in this direction mostly assumed certain statistical structures on the noise model, e.g., the noise is subject to Poisson distribution, and maximized the likelihood function corresponding to the structure. However, when no assumption is made on the nature of noise, the likelihood functions are unavailable and so these approaches do not work. In this paper, with a particular set of sensing vectors, we propose a provable algorithm that is fast, noise-tolerant, and performs well with near-optimal sample complexity. Related Work Lots of literatures have investigated the problem of general (a.k.a. non-sparse) phase retrieval (PR) in recent years. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) They proposed various algorithms with sample complexity of the same order as the ambient dimension, up to a logarithmic factor. Specifically, (Cand es, Strohmer, and Voroninski 2013) proposed the dimension lifting. Although the method enjoys O(n log n) sample complexity, the computational cost is too high to be applied to the real applications. To resolve the issue, (Cand es, Li, and Soltanolkotabi 2015) designed Wirtinger Flow (WF) method (Cand es, Li, and Soltanolkotabi 2015; Zhang and Liang 2016) with sample complexity as small as O(n log n). In comparison with dimension lifting, the time complexity is only O(n3 log(1/ϵ)), which is significantly faster. Here ϵ indicates the algorithmic precision. Later, Truncated Wirtinger Flow (TWF) (Chen and Cand es 2015) reduced the sample complexity to O(n) and time complexity to O(n2 log(1/ϵ)) by truncating large abnormal measurements. The exploration of sparse structures (Zhang et al. 2015a; 2016) in the phase retrieval context (compressive phase retrieval, CPR) has drawn lots of attention as well. A natural approach is to inherit the techniques of general phase retrieval while imposing the sparsity constraint. Specifically, (Moravec, Romberg, and Baraniuk 2007) formulated the problem of sparse phase retrieval as an ℓ1 minimization problem with a non-convex constraint. They empirically showed that the t-sparse signal can be successfully estimated by only O(t log n t ) measurements (Shechtman, Beck, and Eldar 2014). Unfortunately, these non-convex methods typically lack global convergence guarantees. To resolve the issue, some researchers proposed relaxing the non-convex problems to a convex one. Probably one of the most popular techniques is the sparse dimension lifting method. The method lifts the target vector x to a rank-1 sparse matrix xx T and do nuclear norm minimization to recover the hidden signal (Li and Voroninski 2013; Ohlsson et al. 2012). The sample complexity is as low as O(t2 log n) with Gaussian sensing vector (Li and Voroninski 2013). Recent work (Iwen, Viswanathan, and Wang 2015; Bahmani and Romberg 2015) further reduced the sample complexity to O(t log n t ) using constrained sensing vectors. They assumed that the measurement vectors lie in a random low-dimensional subspace and the recovery process can be decomposed into two steps, both of which are convex models. However, the computational cost of these methods is too large to be applied to high-dimensional signal processing. Our Contributions In this paper, we design a fast, noise-tolerant algorithm with near-optimal sample complexity. Our algorithm advances the state-of-the-art approaches in the following aspects: Regarding the sample complexity, our algorithm requires only O(t) measurements, which matches the informationtheoretic limit up to a logarithm factor. The time complexity of our algorithm is O(tn log(1/ϵ)) for error ϵ, which is significantly faster than the dimension lifting based methods. This result is comparable with the computational cost of compressive sensing problem, although the problem of compressive phase retrieval is more challenging than that of compressive sensing. Table 1: Performance indexes in phase retrieval for t-sparse n-dimensional signals with error ϵ WF TWF TWF+Sparse Noise Possion Possion Sub-exponential Sample O(n) O(n) O(t2) Time O(n3 log(1/ϵ)) O(n2 log(1/ϵ)) ungiven* ECPR Sparse PR Ours Noise Unstructured Unstructured Unstructured Sample O(t) O(t) O(t) Time ungiven* ungiven* O(tn log (1/ϵ)) Our algorithm is robust to any bounded yet unstructured noise, with provable intriguing performance guarantee. We show that the ℓ2 error of our estimator decreases in the same order of O(1/m). The comparison of all these three aspects in various algorithms is presented in Table 1. The novelty of our algorithm is a refinement step inspired from disagreement based active learning. This step enables us to exactly recover the underlying signal x0 with finite samples in the noise-free setting, and significantly reduces the error of our estimator in the noisy scenario. Preliminaries Problem Description: The problem of phase retrieval derives from solving a linear system. In the noiseless case, given m linear equations bi = a T i x0, i = 1, 2, ..., m, where ai s Rn are sensing vectors, yi s > 0 are measurements and x0 Rn is the unknown target signal, the problem of phase retrieval assumes that the phases/signs of bi s are unavailable and aims at recovering x0 from the quadratic equations b2 i = yi = (a T i x0)2, i = 1, 2, ..., m. This problem becomes more challenging when the underlying vector x0 is t-sparse and the number of equations m is far less than the ambient dimension n. This is known as the problem of Compressive Phase Retrieval. Our goal is to design an algorithm with sample complexity O(t) and time complexity O(tn), advancing the state-of-the-art results. Noise Model: In the noisy case, an adversary can add any bounded yet unstructured noise {ηi}m i=1 to the clean measurements, namely, yi = (a T i x0)2 + ηi > 0, i = 1, 2, ..., m. (1) Here ηi s are deterministic constants, and the magnitude of noise is bounded such that [η1; ...; ηm] 2 η. Beyond that, no other assumptions are made on the nature of noise and so even adversarial noise is allowed. We investigate the possibility of recovering the hidden signal in this worst case. Sampling Model: To efficiently sense the sparse signal, we study the problem of noisy compressive phase retrieval with a particular set of sensing vectors ai s of the form ai = ΨT wi Rn, (2) where Ψ Rd n and wi s Rd are known a priori. Specifically, we assume that wi s are drawn i.i.d. from N(0, I) of Rd, and Ψ is a restricted isometry matrix such that (1 δ2t) x 2 2 Ψx 2 2 (1 + δ2t) x 2 2, (3) for a constant δ2t [0, 1] and all 2t-sparse vectors x. Therefore, the sensing vectors {ai}m i=1 lie on a fixed ddimensional subspace the row space of matrix Ψ. Such a sampling scheme is realistic in lots of interesting applications (Bahmani and Romberg 2015): In the application of imaging through scattering media, one usually models the optical transfer function by a random matrix, a.k.a. the transfer matrix. In this scenario, different LED excitations lead to illumination patterns that are in the row space of the transfer matrix (Bertolotti et al. 2012; Liutkus et al. 2014). In the application of estimating convariance matrix, convariance matrices that are simultaneously low-rank and sparse can be sketched by sensing vectors that lie in a lowdimensional subspace (Chen, Chi, and Goldsmith 2015). In the application of one-bit camera (Duarte et al. 2008), one modulates the sparse signal x0 in the frequency domain and obtain FT DFx0, where F is the unitary discrete Fourier transform matrix and D is a diagonal matrix with Rademacher diagonal entries. In this setting, we can characterize the sensing vectors by setting Ψ = F in our sampling model. Main Results In this section, we propose a fast algorithm for robustly recovering a sparse signal from phase-quantized measurements, and develop our main theoretical contributions under bounded yet unstructured noise. Our algorithm is robust and requires few measurements: With O(t poly(log n)) samples the algorithm suffices to converge to the underlying t-sparse signal in Rn with provable small error. The time complexity of our algorithm is O(tn log(1/ϵ)), which is significantly faster than dimension lifting based methods (Cand es, Li, and Soltanolkotabi 2015; Cand es, Strohmer, and Voroninski 2013; Bahmani and Romberg 2015). Recovery Algorithm Given a sensing vector a Rn and the measurement y = a, x0 2, to recover the underlying t-sparse signal x0 Rn, we maximize the covariance between y and a, x 2 over the candidate space K = {x Rn : x 0 t, x 2 1}. Namely, we are interested in the optimization problem max x K E( a, x0 a, x )2. (4) Maximizing (4) over the candidate space is a typical nonconvex program which cannot be solved in polynomial time. To mitigate the computational issue, we take into account the sampling strategy of (2). Roughly speaking, our main procedures are a) projecting the high-dimensional signal of Rn to a low-dimensional space Rd = R O(t) by our sampling scheme, and optimizing model (4) in the low-dimensional space without the constraint set K, which can be done via a closed-form solution, and b) recovering the underlying sparse signal by compressive sensing techniques. To this end, our algorithm has an initialization step and a refinement step for procedure a), and a step of sparse recovery for procedure b). Figure 2: Illustration of agreement region of z (Agr Reg( z)) and Ψx0 modulo a global sign. In the red shadow area, sign(w T i Ψx0) = sign(w T i z) for all ai s. It is not hard to see that the agreement region becomes larger as z gets closer to Ψx0. Initialization Step: In the low-dimensional space Rd, we approximate the expectation in the objective function of (4) with the empirical average. By our sampling oracle of (2), the objective function can be approximately rewritten as Ey a, x 2 1 i=1 yi ΨT wi, x 2 i=1 yi wi, Ψx 2 1 i=1 yi wi, z 2. Fortunately, the problem of maximizing (5) has a closedform solution, which is the leading eigenvector of matrix 1 m m i=1 yiwiw T i . Our theoretical analysis shows that the initialization step outputs a solution of error ϵ by a high probability, provided that the sample size is O(ϵ 2t poly(log n)). Recovery via Refinement: Although the initialization step enjoys solid theoretical guarantee, the sample complexity of O(ϵ 2t poly(log n)) implies infinitely many measurements when the error ϵ goes to zero, even in the noise-free setting. To alleviate this issue, we are inspired from the fact that the problem of phase retrieval is as easy as solving a linear system, provided that the phases of those measurements are known a priori. To exploit this, we note that the output of initialization step already contains certain label information. Specifically, we first run the initialization step and obtain a solution z with small constant error, which implies a small constant angle between Ψx0 and z. For this step, we only need O(t poly(log n)) samples. We then safely label all wi s lying in the agreement region of Ψx0 and z with correct phases modulo a global sign (See Figure 2), and solve the resultant linear system by the least squares methods. Sparse Recovery: As the operator Ψ is an almost isometric mapping from Rn to Rd for any t-sparse signal, we can hopefully recover the signal in the original Rn space by standard compressed sensing techniques. Our three-stage approach is summarized in Algorithm 1. Time Complexity: The initialization step computes the leading eigenvector of a d d matrix and runs in O(d2) time. The refinement step solves a least squares problem, which requires at most O(d3) time. For the step of sparse recovery, solving an ℓ1 norm minimization problem needs O(dn log(1/ϵ)) time by alternating direction method of multipliers (Hong and Luo 2012), where ϵ indicates the error. As d = O(t poly(log n)) n, the running time of our algorithm is O(tnpoly(log n) log(1/ϵ)), which is significantly faster than dimension lifting methods for sparse recovery (Bahmani and Romberg 2015). Algorithm 1 Robust Recovery of Sparse Signal by Phase Retrieval Input: A set of sensing vectors {ai Rn : i = 1, 2, ..., m} drawn according to sampling oracle (2), where wi s Rd are i.i.d. sampled from the Gaussian distribution N(0, I); A set of measurements {yi : i = 1, 2, ..., m} generated by (1). Initialize: Solve (6) and obtain the optimal z: z := argmax z 1 m i=1 yi(w T i z)2, s.t. z 2 1. (6) Refine: 1. Determine the index set Ω = {i : wi Agr Reg( z)} according to Agr Reg( z) in Figure 2 or (9). 2. Construct design matrix and b, where WΩ: is to extract the rows of W in the index set Ω: W = w T 1 ; . . . ; w T m Ω:; b = y1; y2; . . . ; ym Ω:. 3. Allocate phases to vector b according to z, modulo a global sign. 4. Solve the least squares problem z := argmin z Wz b 2 2, (7) Recover: Sparse recovery by x := argmin x x 1, s.t. z Ψx 2 O( Output: Sparse estimator x. Recovery Guarantee The analysis on the optimization problems (6), (7), and (8) leads to the following guarantee on Algorithm 1. Theorem 1. Let {ai}m i=1 Rn be random vectors sampled i.i.d. according to the sampling model in the preliminaries section, and d Ct log(n/t). Assume that the measurements {yi}m i=1 follow the model yi = ΨT wi, x0 2 + ηi. Let x0 Rn be the underlying t-sparse signal such that Ψx0 = 1, and m c0d log4 d δϵ . Then with probability at least 1 δ, the output x of Algorithm 1 satisfies min x0 x x0 2 η/m. Theorem 4 implies strong guarantee on the recoverability of Algorithm 1: The algorithm can approximately recover the underlying signal with small error under the adversarial noise, if the sample size is of order O(t poly(log n)). When there is no noise, our algorithm exactly recovers the target vector modulo a global sign. We compare our result with several related line of research in the prior work. The first is the paper of Cand es et al. (Cand es, Li, and Soltanolkotabi 2015), that gives sample complexity bound in the order of O(n log n) in the nonsparse case, via Wirtinger flow. Later, Chen et al. (Chen and Cand es 2015) improved the result to O(n) by truncating those measurements of large magnitude. The noise model in their work is restricted to the Possion distribution. In comparison, our work improves over these results in two-fold: a) we take into account the sparsity structure of the underlying signal, reducing the sample complexity to only having a logarithm dependence on the ambient dimension n; b) our noise model may have arbitrary structure, so even adversarial errors are allowed. There are several recent work that studies the problem of compressive phase retrieval. One of them is the paper of Bahmani and Romberg (Bahmani and Romberg 2015), which requires O(t poly(log n)) samples and much time to approximately recover the target signal under the same sampling model as ours. Compare to this, our algorithm has comparable sample complexity, while our time complexity O(tn) is significantly lower than that of their approach. Dong et al. (Yin et al. 2015) used sparse-graph codes to formulate a Phase Code-style algorithm with similar complexity; however, its measurement matrix is designed particularly and does not cohere with the real applications, such as imaging through scattering media. Cai et al. (Cai, Li, and Ma 2015) proposed applying Wirtinger flow based method to the problem of sparse phase retrieval, with sample complexity O(t2 poly(log n)). In comparison, our result improves over theirs in the order of t. Proof Outline Consider the optimization problem in the initialization step: max z 2 1 1 m i=1 ((w T i Ψx0)2 + ηi)(w T i z)2. Denote by fx0(z) the objective function whose subscript x0 indicates that f is a random function with distribution depending on x0. We claim that for any z that is far away from Ψx0, fx(z) cannot be small. To see this, we begin with an analysis on the expectation of the objective function fx0(z). Lemma 2 (Expectations). Let Ψx0 2 = 1. Suppose that {wi}m i=1 are drawn i.i.d. from the Gaussian distribution N(0, I). Then for any z Rd, we have Efx0(z) = z 2 2 1 + 2z T Ψx0x T 0 Ψz z 2 2 + 1 In particular, if we further have z 2 = 1, then E[fx0(Ψx0) fx0(z)] 1 2 min x0 z Ψx0 2 2. Proof. Note that E(w T i Ψx0)2wiw T i = I + 2Ψx0x T 0 ΨT . So we have E(w T i ψx0)2(w T i z)2 = z T [E(w T i Ψx0)2wiw T i ]z = z 2 2 + 2z T Ψx0x T 0 ΨT z. We also have E(w T i z)2 = z 2 2. So Efx0(z) = z 2 2 1 + 2z T Ψx0x T 0 ΨT z z 2 2 + 1 Furthermore, if z 2 = 1, then E[fx0(Ψx0) fx0(z)] = 2 Ψx0, Ψx0 2 2 z, Ψx0 2 = 2 sin2 min x0 θ(z, Ψx0) 2 min x0 θ2(z, Ψx0) 1 2 min x0 z Ψx0 2 2. Lemma 2 asserts that the optimal solution z to the expected form of (6) i=1 yi(w T i z)2 , s.t. z 2 1, is exactly the desired vector Ψx0 modulo a global sign. By concentration of measure, the output of initialization step will be sufficiently close to Ψx0 when the sample complexity m is large enough. To this end, the following result shows that fx0(z) does not deviate far away from Efx0(z) uniformly for all z {z : z 2 1}, when m is large. The proof can be found in the supplementary material. Lemma 3 (Concentration of Measure). Let z {z Rd : z 2 1} and {wi}m i=1 be random vectors i.i.d. sampled from the Gaussian distribution N(0, I). Suppose that m c0d log4 1 δ ϵ 2 with a constant c0, then with probability at least 1 δ, supz |fx0(z) Efx0(z)| ϵ, where the supremum is over all z {z Rd : z 2 1}. Now we prove the correctness of initialization step. Theorem 4. Let {ai}m i=1 Rn be random vectors sampled i.i.d. from the sampling model in the preliminaries section. Assume that {yi}m i=1 follow the model yi = (a T i x0)2 + ηi. Let δ > 0 and m c0d log4 1 δ ϵ 2. Then with probability at least 1 δ, the solution z to the convex program (6) satisfies min x0 z Ψx0 2 2 ϵ. Proof. The proof is an immediate result of Lemmas 2 and 3. Specifically, note that the optimal solution z to (6) satisfies z 2 = 1. Thus 0 fx0( z) fx0(Ψx0) ( z is the optimal solution) Efx0( z) + ϵ 4 Efx0(Ψx0) + ϵ 4 (By Lemma 3) = E[fx0( z) fx0(Ψx0)] + ϵ 2 min x0 z Ψx0 2 2 + ϵ 2 (By Lemma 2). Thus we obtain that min x0 z Ψx0 2 2 ϵ. We then prove the correctness of refinement step, i.e., model (7). We need the following lemma on the probability of Gaussian vector falling in the disagreement region. Lemma 5 ((Balcan, Blum, and Vempala 2014)). Denote by D an isotropic log-concave distribution in Rn. Then there exist constant c and c such that for any two vectors u and v in Rn we have that cθ(u, v) Pr x D[sign(u T x) = sign(v T x)] c θ(u, v). Lemma 5 states that the disagreement probability only depends on the angle between two vectors, being independent of ambient dimension. Now we are ready to prove the the correctness of refinement step. Proof. Let z0 = Ψx0. By Theorem 4, with m c0d log4 1 δ the initialization step outputs a solution with small constant error ϵ. Since min z0 θ( z, z0) 2 min z0 z z0 2 2 ϵ, the agreement region of z and z0 contains the set Agr Reg( z) = {w Rd : sign( z w) = sign(v w), v satisfies θ( z, v) 2 ϵ} {w Rd : sign( z w) = sign(v w), v satisfies θ( z, v) 2 ϵ} (9) which, by Lemma 5, has probability at least 1 c ϵ on the event that wi s lie in this region for a constant c. So in average, there will be m(1 c ϵ) c0d log4 1 δ (1 c ϵ) many points falling in Agr Reg( z). Using a Chernoff bound, the number of examples falling in the agreement region grows in the same order of its expectation, i.e., c d log4 1 δ (1 c ϵ) for some constant c , with probability at least 1 δ. By standard analysis on sub-Gaussian matrix, the smallest singular value of W is in the order of Θ( m) (Vershynin 2010). On the other hand, the optimal solution to the least squares problem (7) is z = W b. So z z0 2 η W = η/σm(W) = O( Finally, a straightforward application of results on compressive sensing leads to the result of Theorem 1. Numerical Experiments In this section, we implement numerical simulations to evaluate the performance of the proposed Algorithm 1 and compare it with other competing methods. Synthetic Simulations We first investigate the recovery performance of the proposed algorithm 1. We fix the dimension of the groundtruth sparse real signal x0 to be 256 (i.e. n = 256). The non-zero indices of x0 are independently and randomly selected within uniform distribution; entry values are drawn i.i.d. from N(0, 1) and normalized, i.e. x0 2 = 1. Denote the estimated signal as x; since CPR focuses on the direction of the target signal, we evaluate the recovery performance by measuring the relative error defined by min x x x0 2 x0 2 . The entries of noise vector η are drawn i.i.d from the Gaussian mixture distribution of four Gaussian components with identical variance σ2. We denote σ2 as the noise level and in our experiment we choose 10 2 or 10 4. The sensing vectors {wi}m i=1 are generated i.i.d. from wi N(0, I) on Rd and the compressive matrix Ψ with restricted isometry property is also Gaussian with i.i.d. N(0, 1 d) entries. There are three parameters on the recovery accuracy, i.e. the support size t, the low dimension d = O(t log(n/t)) and the sample size m. We select the support size t in set Table 2: The performance of 100 trials of different algorithms with n = 256. relative error t TWF ECPR Sparse PR Ours 6 0.6081 0.0568 0.0566 0.0605 8 0.5142 0.0629 0.0626 0.0624 10 0.4367 0.0559 0.0556 0.0533 12 0.3092 0.0576 0.0590 0.0585 14 0.1524 0.0575 0.0577 0.0580 average running time (s) t TWF ECPR Sparse PR Ours 6 0.0816 4.0276 4.0301 0.0102 8 0.0918 5.0880 5.0919 0.0166 10 0.1063 6.4807 6.4871 0.0267 12 0.1169 8.2197 8.2098 0.0350 14 0.1295 10.3199 10.3323 0.0477 {2, 4, 6, ..., 20} and vary the values of d and m to investigate the influence of their choices on the recovery accuracy of our method. All experiments are implemented over 100 trials and for the Recover step in Algorithm 1 we utilize the SPAMS toolbox (Mairal et al. 2009). Figure 3 shows the 0.9 quantiles of the relative error versus k for different choices of d and m, and the corresponding average running time. As shown in Figure 3, the recovery error is satisfactory and relatively stable with noise for different sparsity levels. With the increase of sample size m, the recovery error significantly drops due to its contribution to a more accurate initial guess and better refinement results. Besides, larger embedding dimension d also contributes more to the improvement of recovery accuracy. It may be intuitive to think that the high noise level would have negative impact on the recovery performance, however, our algorithm still has significant robustness against high-level noise. As for the efficiency performance, it mainly depends on the values of m and d: m dominates the computational cost of refinement step while d is straightforwardly related to Steps 1 and 3. Note that there is a tradeoff between recovery accuracy and efficiency, reflected by m and d: the larger m and d are, the better the accuracy will be and the more time the algorithm will cost. We also compare our algorithm to other state-of-the-art methods, including ECPR (Bahmani and Romberg 2015), Sparse PR (Iwen, Viswanathan, and Wang 2015) and TWF (Chen and Cand es 2015). The generation scheme of groundtruth signal x0, compressive matrix Ψ and sensing vectors wis is the same as in the first experiment apart from the parameter choices. In specific, we choose d = 2t(1+log n t ) , m = 10d, σ2 = 10 2 in different t {6, 8, 10, 12, 14}. Table 2 shows the 0.9 quantiles of the relative error in 100 trails, and also reports the average running time for all competing methods. As shown in Table 2, the error results of TWF are much larger (> 0.6) than the other three methods (< 0.1), which results from that it does not explore the inherent sparse structure of real signals. Our algorithm is comparable with ECPR and Sparse PR in the recovery performance, and all of them can achieve excellent recovery accuracies. However, our algorithm is exceedingly faster than the other algorithms, i.e. 200 300 times than ECPR and Sparse PR. 2 4 6 8 10 12 14 16 18 20 0 sparse level (t) relative error m = 10d, d = d1, sigma=1e 2 m = 10d, d = d1, sigma=1e 1 m = 10d, d = d2, sigma=1e 2 m = 8d, d = d1, sigma=1e 2 m = 8d, d = d1, sigma=1e 1 m = 8d, d = d2, sigma=1e 2 (a) Empirical 0.9 quantile of the relative error 2 4 6 8 10 12 14 16 18 20 0 sparse level (t) running time (s) m = 10d, d = d1, sigma=1e 2 m = 10d, d = d1, sigma=1e 1 m = 10d, d = d2, sigma=1e 2 m = 8d, d = d1, sigma=1e 1 m = 8d, d = d1, sigma=1e 2 m = 8d, d = d2, sigma=1e 2 (b) Average running time (s) Figure 3: The recovery and efficiency performance of 100 trails w.r.t. different sparsity levels t for different choices of m and d with n = 256. Specifically, d1 = 2t(1 + log n t ) and d2 = 2t log n t , where denotes the smallest integer larger than a real number. Real Experiment: Image Recovery We implement our algorithm in the image recovery problem. We select an image of crescent Uranus taken by Voyager 2 shown in Figure 1 and vectorize it into a vector. Then we draw a Gaussian compressive matrix Ψ and sensing vectors wi as the synthetic simulations, and we recover the whole image using our 3-stage algorithm with mixed Gaussian noise level σ2 = 10 2. As shown in Figure 1, the recovered image is of high quality and little distortion. Note that the additional light dots (emerging when zooming in the recovered image) in the recovered image can be effectively removed by various filtering methods, such as median filtering. Also note that the comparable compressive algorithms ECPR and Sparse PR would be restricted on the laptop computer for the lifted signals would not fit into the limited memory budget. Conclusions In this paper, we investigate the problem of compressive phase retrieval. There are three challenges in this field: efficiency, robustness and sample complexity. Prior work mostly covers one or two of these aspects, which limits their applicability to real applications. To address the three challenges simultaneously, we propose a fast and robust algorithm for compressive phase retrieval, which is able to handle any bounded yet unstructured noise. For a t-sparse signal in Rn, our algorithm can recover the ground truth with a small error using O(t) samples and finish in O(tn log(1/ϵ)) time, based on a series of theoretical analysis. Synthetic and real experiments validate the superiority of our algorithm in both recovery accuracy and computational efficiency. Acknowledgements Zhouchen Lin is supported by National Basic Research Program of China (973 Program) (grant no. 2015CB352502), National Natural Science Foundation (NSF) of China (grant nos. 61625301 and 61231002), and Qualcomm. Shan You and Chao Xu are supported by the National Natural Science Foundation of China under Grant NSFC 61375026 and 2015BAF15B00. 2https://www.nasa.gov/sites/default/files/thumbnails/image/ pia00143.jpg References Bahmani, S., and Romberg, J. 2015. Efficient compressive phase retrieval with constrained sensing vectors. In Advances in Neural Information Processing Systems, 523 531. Balcan, M.-F.; Blum, A.; and Vempala, S. 2014. Efficient representations for life-long learning and autoencoding. ar Xiv preprint ar Xiv:1411.1490. Bertolotti, J.; van Putten, E. G.; Blum, C.; Lagendijk, A.; Vos, W. L.; and Mosk, A. P. 2012. Non-invasive imaging through opaque scattering layers. Nature 491(7423):232 234. Cai, T. T.; Li, X.; and Ma, Z. 2015. Optimal rates of convergence for noisy sparse phase retrieval via thresholded wirtinger flow. ar Xiv preprint ar Xiv:1506.03382. Cand es, E. J.; Li, X.; and Soltanolkotabi, M. 2015. Phase retrieval via Wirtinger flow: Theory and algorithms. IEEE Transactions on Information Theory 61(4):1985 2007. Cand es, E. J.; Strohmer, T.; and Voroninski, V. 2013. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics 66(8):1241 1274. Chen, Y., and Cand es, E. J. 2015. Solving random quadratic systems of equations is nearly as easy as solving linear systems. In Advances in Neural Information Processing Systems, 739 747. Chen, Y.; Chi, Y.; and Goldsmith, A. J. 2015. Exact and stable covariance estimation from quadratic sampling via convex programming. IEEE Transactions on Information Theory 61(7):4034 4059. Corbett, J. 2006. The Pauli problem, state reconstruction and quantum-real numbers. Reports on Mathematical Physics 57(1):53 68. Duarte, M. F.; Davenport, M. A.; Takhar, D.; Laska, J. N.; Sun, T.; Kelly, K. E.; Baraniuk, R. G.; et al. 2008. Singlepixel imaging via compressive sampling. IEEE Signal Processing Magazine 25(2):83. Hong, M., and Luo, Z.-Q. 2012. On the linear convergence of the alternating direction method of multipliers. ar Xiv preprint ar Xiv:1208.3922. Iwen, M.; Viswanathan, A.; and Wang, Y. 2015. Robust sparse phase retrieval made easy. Applied and Computational Harmonic Analysis. Li, X., and Voroninski, V. 2013. Sparse signal recovery from quadratic measurements via convex programming. SIAM Journal on Mathematical Analysis 45(5):3019 3033. Liutkus, A.; Martina, D.; Popoff, S.; Chardon, G.; Katz, O.; Lerosey, G.; Gigan, S.; Daudet, L.; and Carron, I. 2014. Imaging with nature: Compressive imaging using a multiply scattering medium. Scientific reports 4. Mairal, J.; Bach, F.; Ponce, J.; and Sapiro, G. 2009. Online dictionary learning for sparse coding. In Proceedings of the 26th annual international conference on machine learning, 689 696. ACM. Miao, J.; Ishikawa, T.; Shen, Q.; and Earnest, T. 2008. Extending x-ray crystallography to allow the imaging of non- crystalline materials, cells, and single protein complexes. Annu. Rev. Phys. Chem. 59:387 410. Moravec, M. L.; Romberg, J. K.; and Baraniuk, R. G. 2007. Compressive phase retrieval. In Optical Engineering+ Applications, 670120 670120. International Society for Optics and Photonics. Ohlsson, H.; Yang, A.; Dong, R.; and Sastry, S. 2012. Cprl an extension of compressive sensing to the phase retrieval problem. In Advances in Neural Information Processing Systems, 1367 1375. Schniter, P., and Rangan, S. 2015. Compressive phase retrieval via generalized approximate message passing. IEEE Transactions on Signal Processing 63(4):1043 1055. Shechtman, Y.; Beck, A.; and Eldar, Y. C. 2014. GESPAR: Efficient phase retrieval of sparse signals. IEEE Transactions on Signal Processing 62(4):928 938. Vershynin, R. 2010. Introduction to the non-asymptotic analysis of random matrices. ar Xiv preprint: 1011.3027. Yin, D.; Lee, K.; Pedarsani, R.; and Ramchandran, K. 2015. Fast and robust compressive phase retrieval with sparsegraph codes. In 2015 IEEE International Symposium on Information Theory (ISIT), 2583 2587. Zhang, H., and Liang, Y. 2016. Reshaped wirtinger flow for solving quadratic systems of equations. ar Xiv preprint ar Xiv:1605.07719. Zhang, H.; Lin, Z.; Zhang, C.; and Chang, E. 2015a. Exact recoverability of robust PCA via outlier pursuit with tight recovery bounds. In AAAI Conference on Artificial Intelligence, 3143 3149. Zhang, H.; Lin, Z.; Zhang, C.; and Gao, J. 2015b. Relations among some low rank subspace recovery models. Neural Computation 27:1915 1950. Zhang, W.; Zhang, L.; Jin, R.; Cai, D.; and He, X. 2016. Accelerated sparse linear regression via random projection. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI), 2337 2343. Zhang, H.; Lin, Z.; and Zhang, C. 2016. Completing lowrank matrices with corrupted samples from few coefficients in general basis. IEEE Transactions on Information Theory (8):4748 4768.