# blind_deconvolutional_phase_retrieval_via_convex_programming__790bf646.pdf Blind Deconvolutional Phase Retrieval via Convex Programming Ali Ahmed Department of Electrical Engineering Information Technology University Lahore, Pakistan. ali.ahmed@itu.edu.pk Alireza Aghasi Department of Business Analytics Georgia State University Atlanta, GA. aaghasi@gsu.edu Paul Hand College of Computer and Information Science Northeastern University Boston, MA. p.hand@northeastern.edu We consider the task of recovering two real or complex m-vectors from phaseless Fourier measurements of their circular convolution. Our method is a novel convex relaxation that is based on a lifted matrix recovery formulation that allows a nontrivial convex relaxation of the bilinear measurements from convolution. We prove that if the two signals belong to known random subspaces of dimensions k and n, then they can be recovered up to the inherent scaling ambiguity with m >> (k + n) log2 m phaseless measurements. Our method provides the first theoretical recovery guarantee for this problem by a computationally efficient algorithm and does not require a solution estimate to be computed for initialization. Our proof is based Rademacher complexity estimates. Additionally, we provide an ADMM implementation of the method and provide numerical experiments that verify the theory. 1 Introduction This paper considers recovery of two unknown signals (realor complex-valued) from the magnitude only measurements of their convolution. Let w, and x be vectors residing in Hm, where H denotes either R, or C. Moreover, denote by F the DFT matrix with entries F[ω, t] = 1 me j2πωt/m, 1 ω, t m. We observe the phaseless Fourier coefficients of the circular convolution w x of w, and x y = |F (w x)|, (1) where |z| returns the element wise absolute value of the vector z. We are interested in recovering w, and x from the phaseless measurements y of their circular convolution. In other words, the problem concerns blind deconvolution of two signals from phaseless measurements. The problem can also be viewed as identifying the structural properties on w such that its convolution with the signal/image of interest x makes the phase retrieval of a signal x well-posed. Since w, and x are both unknown, and in addition, the measurements are phaseless, the inverse problem becomes severly ill-posed as many pairs of w, and x correspond to the same y. We show that this non-linear problem can be efficiently solved, under Gaussian measurements, using a semidefinite program and also theoretically prove this assertion. We also propose a heuristic approach to solve the proposed 32nd Conference on Neural Information Processing Systems (Neur IPS 2018), Montréal, Canada. semidefinite program computationally efficiently. Numerical experiments show that, using this algorithm, one can successfully recover a blurred image from the magnitude only measurements of its Fourier spectrum. Phase retrieval has been of continued interest in the fields of signal processing, imaging, physics, computational science, etc. Perhaps, the single most important context in which phase retrieval arises is the X-ray crystallography Harrison [1993], Millane [1990], where the far-field pattern of X-rays scattered from a crystal form a Fourier transform of its image, and it is only possible to measure the intensities of the electromagnetic radiation. However, with the advancement of imaging technologies, the phase retrieval problem continues to arise in several other imaging modalities such as diffraction imaging Bunk et al. [2007], microscopy Miao et al. [2008], and astronomical imaging Fienup and Dainty [1987]. In the imaging context, the result in this paper would mean that if rays are convolved with a generic pattern (either man made or naturally arising due to propagation of light through some unknown media) w prior to being scattered/reflected from the object, the image of the object can be recovered from the Fourier intensity measurements later on. As is well known from Fourier optics Goodman [2008], the convolution of a visible light with a generic pattern can be implemented using a lens-grating-lens setup. Despite recent advances in theoretical understanding of phase retrieval Candes et al. [2013, 2015a], the application to actual problems such as crystallography remains challenging owing partly to the simplistic mathematical models that may not fully capture the actual physical problem at hand. Our comparatively more complex model in (1) more elaborately encompasses structure in actual physical problems, for example, crystallography, where due to the natural periodic arrangement of a crystal structural unit, the observed electron density function of the crystal exactly takes the form (1); for details, see, Section 2 of Elser et al. [2017]. Blind deconvolution is a fundamental problem in signal processing, communications, and in general system theory. Visible light communication has been proposed as a standard in 5G communications for local area networks Azhar et al. [2013], Retamal et al. [2015], Azhar et al. [2010]. Propagation of information carrying light through an unknown communication medium is modeled as a convolution. The channel is unknown and at the receiver it is generally difficult to measure the phase information in the propagated light. The result in this paper says that the transmitted signal can be blindly deconvolved from the unknown channel from the Fourier intensity measurements of the light only. The reader is referred to the first section of the supplementary note for a detailed description of the visible light communication and its connection to our formulation. 1.1 Observations in Matrix Form The phase retrieval, and blind deconvolution problem has been extensively studied in signal processing community in recent years Candes et al. [2015b], Ahmed et al. [2014] by lifting the unknown vectors to a higher dimensional matrix space formed by their outer products. The resulting rank-1 matrix is recovered using nuclear norm as a convex relaxation of the non-convex rank constraint. Recently, other forms of convex relaxations have been proposed Bahmani and Romberg [2017a], Goldstein and Studer [2018], Aghasi et al. [2017a,b] that solve both the problems in the native (unlifted) space leading to computationally efficiently solvable convex programs. This paper handles the non-linear convolutional phase retrieval problem by lifting it into a bilinear problem. The resulting problem, though still non-convex, gives way to an effective convex relaxation that provably recovers w, and x exactly. It is clear from (1) that uniquely recovering w, and x is not possible without extra knowledge or information about the problem. We will address the problem under a broad and generally applicable structural assumptions that both the vectors w, and x are members of known subspaces of Hm. This means that w, and x can be parameterized in terms of unknown lower dimensional vectors h Hk, and m Hn, respectively as follows w = Bh, x = Cm, (2) where B Hm k, and C Hm n are known matrices whose columns span the subspaces in which w, and x reside, respectively. Recovering h, and m would imply the recovery of w, and x, therefore, we take h, and m as the unknowns in the inverse problem henceforth. Since the circular convolution operator diagonalizes in the Fourier domain, the measurements in (1) take the following form after incorporating the subspace constraints in (2) y = 1 m| ˆBh ˆCm|, where ˆB = m F B, ˆC = m F C, and represent the Hadamard product. Denoting by b ℓand c ℓ the rows of ˆB, and ˆC, respectively, the entries of the measurements y can be expressed as m| bℓ, h cℓ, m |2, ℓ= 1, 2, 3, . . . , m. Evidently the problem is non-linear in both unknowns. However, it reduces to a bilinear problem in the lifted variables hh , and mm taking the form m bℓb ℓ, hh cℓc ℓ, mm = 1 m bℓb ℓ, H cℓc ℓ, M , (3) where H, and M are the rank-1 matrices hh , and mm , respectively. Treating the lifted variables H, and M as unknowns makes the measurements bilinear in the unknowns; a structure that will help us formulate an effective convex relaxation. 1.2 Novel Convex Relaxation The task of recovering H, and M from y in (3) can be naturally posed as an optimization program find H, M (4) subject to 1 m bℓb ℓ, H cℓc ℓ, M = y2 ℓ, ℓ= 1, 2, 3, . . . , m. rank(H) = 1, rank(M) = 1. However, both the measurement and the rank constraints are non-convex. Further, the immediate convex relaxation of each measurement constraint is trivial, as the convex hull of the set of (H, M) satisfying y2 ℓ= 1 m bℓb ℓ, H cℓc ℓ, M is the set of all possible (H, M). To derive our convex relaxation, recall that the true H = hh , and M = mm are also positive semidefinite (PSD). This means that incorporating the PSD constraint in the optimization program translates into the fact that the variables uℓ= bℓb ℓ, H and vℓ= cℓc ℓ, M are necessarily non-negative. That is, H 0, and M 0 = uℓ 0, and vℓ 0, where the implication simply follows by the definition of PSD matrices. This observation restricts the hyperbolic constraint set in Figure 1 to the first quadrant only. For a fixed ℓ, we propose replacing the non-convex hyperbolic set {(uℓ, vℓ) R2 | 1 muℓvℓ= y2 ℓ, uℓ 0, vℓ 0} with its convex hull {(uℓ, vℓ) R2 | 1 muℓvℓ y2 ℓ, uℓ 0, vℓ 0}. In short, our convex relaxation is possible because the PSD constraint from lifting happens to select a specific branch of the hyperbola given by any particular bilinear measurement, and this single branch has a nontrivial convex hull. The rest of the convex relaxation is standard, as the rank constraint in (4) is then relaxed with a nuclear-norm minimization, which reduces to trace minimization in the PSD case. Hence, we study the convex program minimize Tr(H) + Tr(M) (5) subject to 1 m bℓb ℓ, H cℓc ℓ, M y2 ℓ, ℓ= 1, 2, . . . , m The following lemma formally proves the convexity of the optimization program above. Lemma 1. If y Rm such that yℓ> 0 then the optimization program in (5) is a convex program. Proof. The objective of (5) is simply linear, we focus on the constraints. For a fixed ℓ, let Sℓ:= {(H, M) Hk k Hm m | 1 m bℓb ℓ, H cℓc ℓ, M y2 ℓ, H 0, M 0}, Sℓ,1 := {(uℓ, vℓ) R2 | 1 muℓvℓ y2 ℓ, uℓ 0, vℓ 0}, and Sℓ,2 := {(H, M) Hk k Hm m | ( bℓb ℓ, H , cℓc ℓ, M ) Sℓ,1}. To show that Sℓis convex, it suffices to show that Sℓ,1, and Sℓ,2 are convex. 1 muℓvℓ= y2 ℓ muℓvℓ= y2 ℓ, uℓ> 0 Figure 1: Left: Restriction of the hyperbolic constraint to the first quadrant; Right: Abstract Illustration of the Geometry of the Convex Relaxation. PSD cone (blue) and the surface of the hyperbolic set (red) formed by two intersecting hyperbolas (m = 2). Evidently, there are multiple points on the surface and also in the convex hull of the hyperbolic set that lie on the PSD cone. The minimizer of the optimization program (5) picks the one with minimum trace that happens to lie at the intersection of hyperbolic ridge and the PSD cone (pointed out by an arrow). The gray envelope of two (m = 2) hyperplanes surrounding the hyperbolic set correspond to the linearization of the hyperbolic set at the minimizer; this forms the basis of a connected linearly constrained program later in (9). Fix (u1, v1), (u2, v2) Sℓ,1, and let α [0, 1]. Note that u1 > 0, and u2 > 0 as yℓ> 0. Consider 1 m(αu1 + (1 α)u2)(αv1 + (1 α)v2) m (α2u1v1 + (1 α)2u2v2) + α(1 α)(u1v2 + u2v1) (α2y2 ℓ+ (1 α)2y2 ℓ) + α(1 α)(y2 ℓu1 u2 + y2 ℓu2 u1 ) 1 + 2α2u1u2 2αu1u2 + α(1 α)(u2 1 + u2 2) u1u2 1 + (α α2)(u1 u2)2 where the last inequality follows form the fact that α [0, 1], and u1u2 > 0. This shows that Sℓ,1 is convex. The set Sℓ,2 is convex as the inverse image of a convex set of a linear map is convex. This implies that Sℓis convex. Finally, the intersection of any number of convex sets is convex means that the constraint of (5) is convex. This proves that (5) is a convex program. 1.3 Main Result As we are presenting the first analytical results on this problem, we choose the subspace matrices B, and C to be standard Gaussian: B[ℓ, i] Normal(0, 1 m), (ℓ, i) [m] [k], and C[ℓ, i] Normal(0, 1 m), (ℓ, i) [m] [n]. (6) Note that this choice results in bℓ, cℓ Normal(0, I). We show that with this choice the optimization program in (5) recovers a global scaling of (αH , α 1M ) of the true solution (H , M ). We will interchangeably use the notation (H, M) (Hk k, Hn n) to denote the pair of matrices H and M, or the block diagonal matrix (H, M) = H 0 0 M The exact value of the unknown scalar multiple α can be characterized for the solution of (5). Observe that the solution (c H, c M) of the convex optimization program in (5) obeys Tr(c H) = Tr(c M). We aim to show that the solution of the optimization program recovers the scaling ( H, M) of the true solution (H , M ): Tr(H ) H , M = Tr(H ) Tr(M ) M . (8) Note that Tr( H) = Tr( M). The main result can now be stated as follows. Theorem 1 (Exact Recovery). Given the magnitude only spectrum measurements (1) of the convolution of two unknown vectors w , and x in Hm. Suppose that w , and x are generated as in (2), where B, and C are known standard Gaussian matrices as in (6). Then the convex optimiza- tion program in (5) uniquely recovers (αH , α 1M ) for α = q Tr H with probability at least 2mt2) whenever m c p (k + n) log m + t 2 , where c is an absolute constant. 1.4 Main Contributions In this paper, we study the combination of two important and notoriously challenging signal recovery problems: phase retrieval and blind deconvolution. We introduce a novel convex formulation that is possible because the algebraic structure from lifting resolves the bilinear ambiguity just enough to permit a nontrivial convex relaxation of the measurements. The strengths of our approach are that it allows a novel convex program that is the first to provably permit recovery guarantees with optimal sample complexity for the joint task of phase retrieval and blind deconvolution when the signals belong to known random subspaces. Additionally, unlike many recent convex relaxations and nonconvex approaches, our approach does not require an initialization or estimate of the true solution in order to be stated or solved. Admittedly, our method, directly interpreted, is computationally prohibitive for large problem sizes because lifting squares the dimensionality of the problem. Nonetheless, techniques, such as Burer-Monteiro approaches that only maintain low-rank representations Burer and Monteiro [2003], have been developed for similar problems. This current work provides the theoretical justification for the exploration of such problems in this difficult combination of phase retrieval and blind deconvolution, and we leave such work for future research. We do not want to give the reader the impression that the present paper solves the problem of blind deconvolutional phase retrieval in practice. The numerical experiments we perform do indeed show excellent agreement with the theorem in the case of random subspaces. Such subspaces are unlikely to appear in practice, and typically appropriate subspaces would be deterministic, including partial Discrete Cosine Transforms or partial Discrete Wavelet Transforms. Numerical experiments, not shown, indicate that our convex relaxation is less effective for the cases of these deterministic subspaces. We suspect this is due to the fact that the subspaces for both measurements should be mutually incoherent, in addition to both being incoherent with respect to the Fourier basis given by the measurements. As with the initial recovery theory for the problems of compressed sensing and phase retrieval, we have studied the random case in order to show information theoretically optimal sample complexity is possible by efficient algorithms. Based on this work, it is clear that blind deconvolutional phase retrieval is still a very challenging problem in the presence of deterministic matrices, and one for which development of convex or nonconvex methods may provide substantial progress in applications. 2 Proof of Theorem 1 To prove Theorem 1, we will show that ( H, M) is the unique minimizer of an optimization program with a larger feasible set defined by linear constraints. Lemma 2. If ( H, M) is the unique solution to minimize H + M (9) subject to 1 m( bℓb ℓ, H cℓc ℓ, M + bℓb ℓ, H cℓc ℓ, M ) 2y2 ℓ, ℓ= 1, 2, 3, . . . , m. then ( H, M) is the unique solution to (5). Proof. Start by observing that the trace in (5) can be replaced with nuclear norm as on the set of PSD matrices both are equivalent. This gives minimize H + M (10) subject to 1 m bℓb ℓ, H cℓc ℓ, M y2 ℓ, ℓ= 1, 2, . . . , m It suffices now to show that the feasible set of (9) contains the feasible set of (10). Recall the notations uℓ= bℓb ℓ, H , vℓ= cℓc ℓ, M , uℓ= bℓb ℓ, H , and vℓ= cℓc ℓ, M . Using the fact that a convex set with smooth boundary is contained in a half space defined by the tangent hyperplane at any point on the boundary of the set. Consider the point ( uℓ, vℓ) R2, and observe that (uℓ, vℓ) R2 | 1 muℓvℓ y2 ℓ, uℓ 0, and vℓ 0 (uℓ, vℓ) R2 | 1 uℓ uℓ vℓ vℓ Rewriting uℓand vℓin the form of original constraints, we have that any feasible point ( H, M) of (10) satisfies 1 m( bℓb ℓ, H cℓc ℓ, M + bℓb ℓ, H cℓc ℓ, M ) 2y2 ℓ, ℓ= 1, 2, 3, . . . , m. The geometry of the linearly constrained program (9) is also shown in Figure 1 (Right), where the hyperbolic set is replaced by an envelop of hyperplanes defined by the linear constraints of (9). Visually it is clear from Figure 1 that the feasible set of (9) is larger than that of (5). Define a set S := {(H, M) | (H, M) = α( H, M), and α [ 1, 1]}, and Aℓ = ( vℓbℓb ℓ, uℓcℓc ℓ) H(k+n) (k+n), and define a linear map A : H(k+n) (k+n) Hm as A((H, M)) = [ A1, (H, M) , . . . , Am, (H, M) ]T; one can imagine A as a matrix with vectorized Aℓas its rows. The linear constraints in the (9) are A((H, M)) 2y2; the inequality here applies elementwise. Furthermore, define N := span(( H, M)), and it is easy to see that S N Null(A). We want to show that any feasible perturbation (δH, δM) around the truth ( H, M) strictly increases the objective. From the discussion above, it is clear that the perturbations (δH, δM) S do not change the objective and also lead to feasible points of (9). Our general strategy will be to resolve any perturbation (δH, δM) into two components, one in N and the other in N , where N is the orthogonal complement of the subspace N. The component in N does not affect the objective. We show that the components in N of all the feasible perturbations lead to a strict increase in the objective of (9). This should imply that that the minimizer of (9) can be anywhere in the set1 ( H, M) N. However, as we are minimizing the (trace) norms, an arbitrary large scaling of the solution is prevented and it is restricted to the subset ( H, M) S. Moreover, among these solutions only ( H, M) lies in the feasible set of (10). Given this and the fact that ( H, M) is a minimizer of (9) implies that ( H, M) is the unique minimizer of (10). We begin by characterizing the set of descent directions for the objective function of the optimization program (9). Let T h, and T m be the set of symmetric matrices of the form T h := {X = hz + z h }, T m := {X = mz + z m }, and denote the orthogonal complements by T h , and T m, respectively. Note that X T h iff both the row and column spaces of X are perpendicular to h. PT h denotes the orthogonal projection onto the set T h, and a matrix X of appropriate dimensions can be projected into T h as PT h(X) := h h h 2 2 X + X h h h 2 2 X h h h 2 2 Similarly, define the projection operator PT m. The projection onto orthogonal complements are then simply PT h := I PT h, and PT m := I PT m, where I is the identity operator. We use XT h as a 1For a point x, and a set S, the notation x S denotes a set of points x + si for every si S. shorthand for PT h(X). Using the notation in (7), the objective of (9) is (H, M) , and subgradient of the objective at the proposed solution ( H, M) is ( H, M) := G = ( h h , m m ) + (W T h , W T m), (W T h , W T m) 1 . The set Q of descent directions of the objective of (9) is defined as (δH, δM) N : (G, (δH, δM) 0, G ( H, M) (δH, δM) N : ( h h , m m ), (δH, δM) + (δHT h , δM T m) 0, G ( H, M) (δH, δM) N : (δHT h , δM T m) (δHT h, δM T m) F , G ( H, M) We quantify the "width" of the set of descent directions Q through a Rademacher complexity, and a probability that the gradients of the constraint functions of (9) lie in a certain half space. This enables us to build an argument using the small ball method Koltchinskii and Mendelson [2015], Mendelson [2014] that it is unlikely to have points that meet the constraints in (9) and still be in Q. Before moving forward, we introduce the above mentioned Rademacher complexity and probability term. Denote the constraint functions as2 fℓ(H, M) = uℓ cℓc ℓ, M + vℓ bℓb ℓ, H . For a set Q (Hk k, Hn n), the Rademacher complexity of the gradients fℓ= ( fℓ M ) = ( vℓbℓb ℓ, uℓcℓc ℓ) is defined as C(Q) := E sup (H,M) Q ℓ=1 εℓ D fℓ, (H,M) (H,M) F where εℓ, ℓ= 1, 2, 3, . . . , m are iid Rademacher random variables independent of everything else in the expression. For a convex set Q, C(Q) is a measure of the width of Q around origin interms of the gradients fℓ, ℓ= 1, 2, 3, . . . , m. For example, random choice of gradient might yield little overlap with a structured set Q leading to a smaller complexity Q. Our result also depends on a probability pτ(Q) and a positive parameter τ defined as pτ(Q) := inf (H,M) Q P f, (H, M) τ (H, M) F . (13) The probability pτ(Q) quantifies visibility of the set Q through the gradient vectors f. A small value of τ and pτ(Q) means that the set Q mainly remains invisible through the lenses of fℓ, ℓ= 1, 2, 3, . . . , m. This can be appreciated just by noting that pτ(Q) depends on the correlation of the elements of Q with the gradient vectors fℓ. Following lemma shows that the minimizer of the linear program (9) almost always resides in the desired set ( H, M) S for a sufficiently large m quantified interms of C(Q), pτ(Q), and τ. Lemma 3. Consider the optimization program in (9) and Q, characterized in (11), be the set of descent directions for which C(Q), and pτ(Q) can be determined using (12) and (13). Choose m 2C(Q) + tτ for any t > 0. Then the minimizer (c H, c M) of (9) lies in the set ( H, M) S with probability at least 1 e 2mt2. Proof of this lemma is based on small ball method developed in Koltchinskii and Mendelson [2015], Mendelson [2014] and further studied in Lecué et al. [2018], Lecué and Mendelson [2017]. The proof is mainly repeated using the argument in Bahmani and Romberg [2017b], and is provided in the supplementary material for completeness. With Lemma 3 in place, an application of Lemma 2 and the discussion after it proves that for choice of m outlined in Lemma 3, ( H, M) is the unique minimizer of (5). The last missing piece in the proof of Theorem 1 is the computation of the Rademacher complexity C(Q), and pτ(Q) for the Q. 2For brevity, we will often drop the dependence on H, and M in the notation fℓ(H, M) 2.1 Rademacher Complexity We begin with evaluation of the complexity C(Q) C(Q) := E sup (δH,δM) Q ℓ=1 εℓ D fℓ, (δH,δM) (δH,δM) F Splitting (δH, δM) between (T h, T m), and (T h , T m), and using Holder s inequalities, we obtain ℓ=1 εℓ( vℓPT h(bℓb ℓ), uℓPT m(cℓc ℓ)) F sup (δH,δM) Q (δHT h,δM T m) ℓ=1 εℓ( vℓbℓb ℓ, uℓcℓc ℓ) sup (δH,δM) Q (δHT h ,δM T m ) On the set Q, defined in (11), we have (δHT h ,δM T m ) (δHT h,δM T m) Using Jensen s inequality, the first expectation simply becomes ℓ=1 εℓ vℓPT h(bℓb ℓ), uℓPT m(cℓc ℓ) F ℓ=1 εℓ vℓPT h(bℓb ℓ), uℓPT m(cℓc ℓ) 2 ℓ=1 E vℓPT h(bℓb ℓ) 2 F + uℓPT m(cℓc ℓ) 2 F , where the last equality follows by going through with the expectation over εℓ s. Recall from the definition of the projection operator that PT h(bℓb ℓ) := h h h 2 2 bℓb ℓ+ bℓb ℓ h h h 2 2 bℓb ℓ h h h 2 2 , and vℓ= |c ℓ m|2. It can be easily verifies that PT h(bℓb ℓ) 2 F = 2 |b ℓ h|2 h 2 2 bℓ 2 2 |b ℓ h|4 h 4 2 , and, therefore, E vℓPT h(bℓb ℓ) 2 F E|c ℓ m|4 2 E 2 |b ℓ h|2 h 2 2 bℓ 2 2 |b ℓ h|4 3 m 4 2 (6k 3) , where we used a simple calculation involving fourth moments of Gaussians E|b ℓ h|2 bℓ 2 2 = 3k h 2 2. In an exactly similar manner, we can also show that uℓPT m(cℓc ℓ) 2 F 3 h 4 2(6n 3). Putting these together gives us ℓ=1 εℓ vℓPT h(bℓb ℓ), uℓPT m(cℓc ℓ) F 5 max( h 2 2, m 2 2) ℓ=1 εℓ( vℓbℓb ℓ, uℓcℓc ℓ) E max ℓ( uℓ, vℓ) E 1 m ℓ=1 εℓ(bℓb ℓ, cℓc ℓ) Standard net arguments; see, for example, Sec. 5.4.1 of Eldar and Kutyniok [2012] show that ℓ=1 εℓ(bℓb ℓ, cℓc ℓ) c e cm, provided that m c(k + n). This directly implies that E 1 m Pm ℓ=1 εℓ(bℓb ℓ, cℓc ℓ) c k + n. The random variables uℓand vℓbeing sub-exponential have Orlicz-1 norms bounded by c max( h 2 2, m 2 2). Using standard results, such as Lemma 3 in van de Geer and Lederer [2013], we then have E maxℓ(uℓ, vℓ) c log m. Putting these together yields ℓ=1 εℓ( vℓbℓb ℓ, uℓcℓc ℓ) c max( h 2 2, m 2 2) q (k + n) log2 m. (14) We have all the ingredients for the final bound on C(Q) stated below C(Q) c max( h 2 2, m 2 2) q (k + n) log2 m. (15) Figure 2: Phase portraits highlighting the frequency of successful recoveries of the proposed convex program for random and deterministic channel subspaces (see the text for the experiment details) 2.2 Probability pτ(Q) The calculation for the probability pτ(Q), and the positive parameter τ are given in Supplementary material due to limitation of space. We find that pτ(Q) c > 0, and τ = c max( h 2 2, m 2 2). (16) The complexity estimate in (15), value of τ computed above, and pτ(Q) stated in (16) together with an application of Lemma 3 prove Theorem 1. 3 Convex Implementation and Phase Transition To implement the semi-definite convex program (5), we propose a numerical scheme based on the alternating direction method of multipliers (ADMM). Due to the space limit, the technical details of the algorithm are moved to Section 4 of the supplementary note. To illustrate the perfect recovery region, in Figure 2 we present the phase portrait associated with the proposed convex framework. To obtain the diagram on the left panel, for each fixed value of m, we run the algorithm for 100 different combinations of n and k, each time using a different set of Gaussian matrices B and C. If the algorithm converges to a sufficiently close neighborhood of the ground-truth solution (a distance less than 1% of the solution s ℓ2 norm), we label the experiment as successful. Figure 2 shows the collected success frequencies, where solid black corresponds to 100% success and solid white corresponds to 0% success. For an empirically selected constant c, the success region almost perfectly stands on the left side of the line n + k = cm log 2 m. While the analysis in this paper is specifically focused on the Gaussian subspace embeddings for w and x, on the right panel of Figure 2 we have plotted the phase diagram for the case that B is deterministic and a subset of the columns of identity matrix (equispaced sampling of the columns), and C is Gaussian as before. This importantly hints that the convex framework is applicable to more realistic deterministic subspace models. Acknowledgments PH acknowledges support from NSF DMS 1464525. Robert W Harrison. Phase problem in crystallography. JOSA a, 10(5):1046 1055, 1993. Rick P Millane. Phase retrieval in crystallography and optics. JOSA A, 7(3):394 411, 1990. Oliver Bunk, Ana Diaz, Franz Pfeiffer, Christian David, Bernd Schmitt, Dillip K Satapathy, and J Friso van der Veen. Diffractive imaging for periodic samples: retrieving one-dimensional concentration profiles across microfluidic channels. Acta Crystallographica Section A: Foundations of Crystallography, 63(4):306 314, 2007. Jianwei Miao, Tetsuya Ishikawa, Qun Shen, and Thomas Earnest. Extending x-ray crystallography to allow the imaging of noncrystalline materials, cells, and single protein complexes. Annu. Rev. Phys. Chem., 59:387 410, 2008. C Fienup and J Dainty. Phase retrieval and image reconstruction for astronomy. Image Recovery: Theory and Application, 231:275, 1987. Joseph Goodman. Introduction to fourier optics. 2008. Emmanuel J Candes, Thomas Strohmer, and Vladislav Voroninski. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics, 66(8):1241 1274, 2013. Emmanuel J Candes, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval from coded diffraction patterns. Applied and Computational Harmonic Analysis, 39(2):277 299, 2015a. Veit Elser, Ti-Yen Lan, and Tamir Bendory. Benchmark problems for phase retrieval. ar Xiv preprint ar Xiv:1706.00399, 2017. Ahmad Helmi Azhar, Thomas Tran, and Dominic O Brien. A gigabit/s indoor wireless transmission using mimo-ofdm visible-light communications. IEEE photonics technology letters, 25(2):171 174, 2013. José Ramón Durán Retamal, Hassan Makine Oubei, Bilal Janjua, Yu-Chieh Chi, Huai-Yung Wang, Cheng-Ting Tsai, Tien Khee Ng, Dan-Hua Hsieh, Hao-Chung Kuo, Mohamed-Slim Alouini, et al. 4-gbit/s visible light communication link based on 16-qam ofdm transmission over remote phosphor-film converted white light by using blue laser diode. Optics express, 23(26):33656 33666, 2015. Ahmad Helmi Azhar, Tuan-Anh Tran, and Dominic O Brien. Demonstration of high-speed data transmission using mimo-ofdm visible light communications. In GLOBECOM Workshops (GC Wkshps), 2010 IEEE, pages 1052 1056. IEEE, 2010. Emmanuel J Candes, Yonina C Eldar, Thomas Strohmer, and Vladislav Voroninski. Phase retrieval via matrix completion. SIAM review, 57(2):225 251, 2015b. Ali Ahmed, Benjamin Recht, and Justin Romberg. Blind deconvolution using convex programming. IEEE Transactions on Information Theory, 60(3):1711 1732, 2014. Sohail Bahmani and Justin Romberg. Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation. In Aarti Singh and Jerry Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 252 260, Fort Lauderdale, FL, USA, 20 22 Apr 2017a. PMLR. URL http: //proceedings.mlr.press/v54/bahmani17a.html. Tom Goldstein and Christoph Studer. Phasemax: Convex phase retrieval via basis pursuit. IEEE Transactions on Information Theory, 2018. Alireza Aghasi, Ali Ahmed, and Paul Hand. Branchhull: Convex bilinear inversion from the entrywise product of signals with known signs. ar Xiv preprint ar Xiv:1702.04342, 2017a. Alireza Aghasi, Ali Ahmed, and Paul Hand. Convex inversion of the entrywise product of real signals with known signs. In Signals, Systems, and Computers, 2017 51st Asilomar Conference on, pages 1622 1626. IEEE, 2017b. Samuel Burer and Renato D.C. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Mathematical Programming, 95(2):329 357, Feb 2003. ISSN 1436-4646. doi: 10.1007/s10107-002-0352-8. URL https://doi.org/10.1007/ s10107-002-0352-8. Vladimir Koltchinskii and Shahar Mendelson. Bounding the smallest singular value of a random matrix without concentration. International Mathematics Research Notices, 2015(23):12991 13008, 2015. Shahar Mendelson. Learning without concentration. In Conference on Learning Theory, pages 25 39, 2014. Guillaume Lecué, Shahar Mendelson, et al. Regularization and the small-ball method i: sparse recovery. The Annals of Statistics, 46(2):611 641, 2018. Guillaume Lecué and Shahar Mendelson. Regularization and the small-ball method ii: complexity dependent error rates. The Journal of Machine Learning Research, 18(1):5356 5403, 2017. Sohail Bahmani and Justin Romberg. Anchored regression: Solving random convex equations via convex programming. ar Xiv preprint ar Xiv:1702.05327, 2017b. Yonina C Eldar and Gitta Kutyniok. Compressed sensing: theory and applications. Cambridge University Press, 2012. Sara van de Geer and Johannes Lederer. The bernstein orlicz norm and deviation inequalities. Probability theory and related fields, 157(1-2):225 250, 2013.