The Generalized Bregman Distance

Recently, a new distance has been introduced for the graphs of two point-to-set operators, one of which is maximally monotone. When both operators are the subdifferential of a proper lower semicontinuous convex function, this distance specializes under modest assumptions to the classical Bregman distance. We name this new distance the generalized Bregman distance, and we shed light on it with examples that utilize the other two most natural representative functions: the Fitzpatrick function and its conjugate. We provide sufficient conditions for convexity, coercivity, and supercoercivity: properties which are essential for implementation in proximal point type algorithms. We establish these results for both the left and right variants of this new distance. We construct examples closely related to the Kullback-Liebler divergence, which was previously considered in the context of Bregman distances, and whose importance in information theory is well known. In so doing, we demonstrate how to compute a difficult Fitzpatrick conjugate function, and we discover natural occurrences of the Lambert $\mathcal{W}$ function, whose importance in optimization is of growing interest.

In 1967, Bregman introduced the distance constructed for a differentiable convex function f , which now bears his name [11] and whose corresponding envelopes and proximity operators specify to the Moreau proximity operator [26,1,2] and envelope when f is the energy · 2 /2. When f Here (2a) is the definition of the Bregman distance and (2b) uses the Fenchel-Young equality: (∀v ∈ ∇ f (y)) f (y) + f * (v) = y, v . From (2b), they made the observation that and so D f has the dual characterization of serving as a distance between gradients 1 . Based on this characterization, they introduced a distance based on the representative function h of a monotone operator: This distance generalizes the Bregman distance, specializing-under the mild domain conditions in 2.2-thereto when h is the Fenchel-Young representative for T = ∇ f , which is defined by f ⊕ f * (x, y) = f (x) + f * (y).
Naturally, we name this more general distance the generalized Bregman distance (GBD). In lieu of the Fenchel-Young representative, the Fitzpatrick function and its conjugate are the two other functions that are most natural to consider. As with the Bregman distance, we obtain left and right variants; these admit new left and right coercivity and supercoercivity properties, along with envelopes and corresponding proximity operators, when the GBD replaces the Bregman distance in the construction of envelopes.

Outline and Contributions
This article is outlined as follows. In Section 2, we recall the GBD as introduced in [14]. We provide its basic properties and clarify the domain conditions under which the Fenchel-Young representative case specializes to the Bregman distance. We also introduce the closed variant, which may specialize to the Bregman distance at more points in the boundary of the domain.
In Section 3, we show how to compute the new GBDs. We first illustrate with the energy, which specifies to the Moreau case when the Fenchel-Young representative is used. We also illustrate with the Boltzmann-Shannon entropy, whose derivative is the logarithm and whose Bregman distance, the Kullback-Liebler divergence, is commonly used as a measure of the difference between positive vectors in information theory and elsewhere. We compare the Kullback-Liebler divergence with the similar Fenchel-Young representative GBD for the logarithm, and also with its closed version. We also illustrate how to compute with the two other most natural representatives to consider: the book-end cases for the representative function set. These are the smallest representative, named the Fitzpatrick function, and the biggest representative, which is obtained using the conjugate of the Fitzpatrick function.
Interestingly, while the Fitzpatrick function for the logarithm was discovered in [8], the present work contains the first computation of its conjugate. The discovery and proof rely upon the graphical characterizations of representative functions, and the special function Lambert W plays an important role in the computational aspects of discovery. The way that we tackle this problem is very prototypical of the approach that one might need to use when computing other representative functions and GBDs. We furnish a full discussion of the process in Section 3.1, so that it may serve as a tutorial for other researchers. Section 4 contains our most important results. We provide a framework of sufficient conditions for coercivity and supercoercivity of the left and right GBDs. This framework uses the fact that the GBDs majorize a set distance. We illustrate with figures in 2 dimensions, and we provide examples of what may go wrong when the sufficient conditions provided by our framework are not satisfied. In Section 4.3, we explain how these coercivity and supercoercivity conditions may be used to guarantee the coercivity of the sum of the distance together with a Legendre function.
Such sums are the basis of the corresponding envelope functions and their proximity operators. In the Bregman case, these coercivity conditions admit the further analysis of the envelopes and proximity operators, including their asymptotic behaviour as the scalar parameter varies [6]. Our work lays the necessary foundation for such an analysis in the case of envelopes built from GBDs. The study of envelopes is important, because many optimization algorithms may be viewed as special cases of gradient descent applied to envelopes; see, for example, [29,27]. We conclude in Section 5.

Preliminaries on Generalized Bregman Distances
Given a function f : X → R ∞ , its domain (or effective domain) is defined by dom f := {x ∈ X : f (x) < ∞} and its lower level set at height ξ ∈ R by lev ≤ξ f : The Fenchel conjugate of f is the mapping From the definition, we have Fenchel-Young inequality and if f is convex, then Given a point-to-set operator T : X ⇒ X * , its domain is dom T := {x ∈ X : Tx = ∅}, its range is ran T := T(X), and its graph is G( A detailed study of maximally monotone operators can be found in [4,Chapters 20 and 21] for Hilbert spaces, and in [17,Chapter 4] for the Banach space case.

Representative Functions
Let S : X ⇒ X * be a maximally monotone operator. We recall from [14, Definition 2.3] that h : X × X * → R ∞ represents S and denote h ∈ H(S) if the following three conditions hold: (a) h is convex and norm × weak * lower semicontinuous in X × X * .
We will make use, in particular, of several representative functions. These are as follows.
The largest member of H(S) we denote by σ S ; it may be computed using the identities in Fact 2.1. Let S : X ⇒ X * be a maximally monotone operator and X a real Banach space. We have the following characterizations of σ S and F * S . (i) From [19,Equation (33)], we have that is the duality product defined in X × X * , and co(A) is an abbreviation for the closure of the convex hull of a set A. [19,Corollary 4.2], if h is convex and lower semicontinuous on X × X * and F S ≤ h ≤ σ S , then h ∈ H(S).
The astute reader will notice that (iii) may be obtained by combining (i) and (ii), since Additionally, (iv) is quite pleasing, because it admits as representative functions the convex combinations of other representative functions.

A new "Generalized Bregman" Distance between point-to-set operators
From now on, we assume that S : X ⇒ X * is a maximally monotone operator, h ∈ H(S), and T : X ⇒ X * . Following [14, Definition 3.1], for fixed (x, y) ∈ dom S × dom T, we define T (x, y) := +∞ for every y ∈ X. When T is point to point, we simply write D h T := D ,h T = D ,h T . For our examples, T = S = ∂ f is point-to-point on int dom f in which case we simply write D h . Additionally, when employing a specific representative function, we will use the name of the representative function used. Specifically: (i) When h is the Fitzpatrick function F S for a maximally monotone operator S, we will write D F S ; (ii) When h is σ S , the largest member of H(S), we will write D σ S ; (iii) When h := f ⊕ f * ∈ H(∂ f ) for f ∈ Γ 0 (H) is the Fenchel-Young representative, we denote this by D f ⊕ f * .
If a distance is of the form (13b) or (13a) we call it a generalized Bregman distance or GBD for short. The GBD specializes to the Bregman distance under certain circumstances, which we now recall. To a proper and convex function f : X → R ∞ , we associate two Bregman distances (see [25]) defined by Burachik and Martínez-Legaz observed that the GBD specializes to the Bregman distance in the case where the Fenchel-Young representative distance is used. The following proposition provides a minor omission from [14,Proposition 3.5], namely that the Fitzpatrick distance specializes to the Bregman distance under the mild condition that ( In the case when dom f \ dom ∂ f = ∅, they are everywhere equal. We will see later in an example that when f is the Boltzmann-Shannon (21), we have dom f \ dom ∂ f = {0}, and the two distances fail to be equal on the set {(0, y)| y > 0}.

Proposition 2.2 (The GBD generalizes the Bregman distance).
Let f ∈ Γ 0 (X). Then It suffices to assume that y ∈ dom ∂ f and . The conclusion follows.
We recall now the following results regarding the lower semicontinuity of the left and right distances; these apply to each of our computed examples.

Lemma 2.3 ([14, Lemma 3.17])
. Let y ∈ dom T. Then the following hold: with respect to the strong topology in X provided that Tz is weakly closed for any z in its domain; with respect to the strong topology in X.

Lemma 2.4 ([14, Lemma 3.18]).
Supose that T is locally bounded in the interior of its domain and that the graph of T is closed with respect to the strong-weak topology. Fix y ∈ int dom T and x ∈ dom S. Then the function D ,h T : X → R ∞ is lsc at y with respect to the strong topology in X. Remark 2.5 (The lower closed distance). Notice that in Lemma 2.3(ii), D ,h T (·, y) may not be lower semicontinuous at x ∈ dom∂ f \ dom ∂ f , a case we will encounter in our examples. Notice also that in Lemma 2.4, for y / ∈ int dom T, the distance may not be lower semicontinuous with respect to the second variable, a phenomenon we will encounter in our examples.
For these two reasons, we also introduce the notion of the lower closed GBD, which satisfies where may be either or . The lower closed GBD is the lower semicontinuous regularization of the function D ,h T , as described in [28]; its direct formula is given by The lower closed distances D F S , D σ S , and D f ⊕ f * are defined analogously.

How to compute Generalized Bregman Distances
Example 3.1 (Energy). Let f : x → 1 2 x 2 be the energy. If we have h Id as the Fitzpatrick function for ∂ f = Id, then our GBD distance is which is equivalent to a scaled version of the usual Moreau distance. On the other hand, the largest element of H(Id) is just One can obtain this result by computing F * Id straight from the definition of the conjugate and using Fact 2.1(iii). One can also obtain this result by using Fact 2.1(i), because the graph of Id is simply the diagonal. The corresponding distance is In [6], the asymptotic properties of Bregman envelopes are illustrated using Bregman distances constructed from three functions. One of these was the energy from Example 3.1, for which the Bregman proximity operator and envelope specialize to the Moreau case. While the choice of representative function F Id is equivalent to the Moreau case up to a change in parameter, notice that the example D σ Id illustrates that this is not the case for any choice of representative function. This is an important distinction in our context.

Boltzmann-Shannon Entropy
Another function considered in [6] is the (negative) Boltzmann-Shannon entropy, defined as follows: The Boltzmann-Shannon entropy is particularly important and natural to consider, because its derivative is log, its conjugate is ent * = exp, and its associated Bregman distance is the Kullback-Liebler divergence, which is frequently used as a measure of distance between positive vectors in information theory, statistics, and portfolio selection. The GBD associated with the Fenchel-Young representative ent ⊕ ent * ∈ H(log) is Thus it may be seen that the Bregman distance of the Boltzmann-Shannon entropy is the special case of the GBD for the Fenchel-Young representative of the logarithm function, except on the set and is shown in Figure 3b, while the Fenchel-Young representative ent ⊕ ent * is shown in Figure 2b.
We will consider new distances built from the maximally monotone operator log, and compare these to the known special case of the Bregman distance for the Boltzmann-Shannon entropy. The corresponding Fitzpatrick function (as computed in [8]) and shown in Figure 2a is where W is the real principal branch of the Lambert W function which satisfies W ( [23]. Its occurrences in convex analysis have been discussed in, for example, [7,9].
This distance is shown in Figure 3a.
Proof. Combining Definitions 13 and 26 with the fact that dom log = ]0, +∞[, we have This simplifies, by a bit of arithmetic, to the form in (26), except on the set {0} × [0, ∞[. Taking the closure of the epigraph admits D F log (0, y) = ye −1 . This example is particularly interesting, because we see the loss of the left lower semicontinuity property at 0 because 0 ∈ dom f \ dom log, and we also see the loss of the right lower semicontinuity property because 0 / ∈ int dom log; See Remark 2.5.

Computation of a difficult representative function and the conjugate of a Fitzpatrick function
Next we consider the case where S = log is the logarithm function on ]0, ∞[. Even though we know the form of F log , it is not straightforward to compute σ log using the equality σ log (x, y) = F * log (y, x) from Fact 2.1(iii) by subdifferentiating with the latter and solving. Instead, we use the characterization from Fact 2.1(i).
Recall that, for an arbitrary function g and its convex hull function co(g), the lower semicontinuous regularization or lower closure, denoted as co(g) has the property epi(co(g)) = co(epi(g)).
Theorem 3.3 (The representative σ log ). Let T = S = log. Then whose graph is shown in Figure 2c.
Proof. Using Fact 2.1 together with the fact that G(S) = {(z 1 , log(z 1 ))|z 1 ∈ ]0, ∞[} and the fact that log is a concave function, we have that z 2 > log(z 1 ) implies σ ∂ f (z 1 , z 2 ) = ∞. Indeed, let g : R 2 → R ∞ defined as g := π + ι G(S) ; the graph of g is shown as the dark curve at the boundary of the surface in Figure 2c. By Fact 2.1 we have that σ ∂ f = co(g). If co(g)(z 1 , z 2 ) < ∞ then there exists a ∈ R such that (z 1 , z 2 , a) ∈ epi co(g). By (28) this is equivalent to (z 1 , z 2 , a) ∈ co epi(g).
The last inclusion means that there exists a sequence w n := (z n 1 , z n 2 , a n ) ∈ conv epi(g) such that (z 1 , z 2 , a) = lim n→∞ (z n 1 , z n 2 , a n ). Note that we can assume that (z n 1 , z n 2 , a n ) = 4 ∑ i=1 λ n,i (z n 1,i , z n 2,i , a i,n ), with (z n 1,i , z n 2,i , a i,n ) ∈ epi(g), thanks to Caratheodory's theorem. Using the fact that (z n 1,i , z n 2,i , a i,n ) ∈ epi(g), we have that Using the above expression for z n 2 and the fact that z n 2,i = log z n 1,i , we can write where we used the fact that log(·) is concave. Taking limits and using the continuity of the log(·) we deduce that z 2 ≤ log(z 1 ). This implies that, when z 2 > log(z 1 ) we must have σ ∂ f (z 1 , z 2 ) = ∞. This shows the second part of the definition in the statement of the theorem. We proceed now to prove the first part of the definition of σ ∂ f . Let z ∈ R 2 such that z 1 > 0 and z 2 ≤ log(z 1 ). For any x, y ∈ G(S) which satisfy λx + (1 − λ)y = z for some λ ∈ [0, 1], we have that g(x) = x 1 , x 2 = x 1 x 2 and that g(y) = y 1 , y 2 = y 1 y 2 . Thus and (y 1 , y 2 , y 1 y 2 ) = (y 1 , y 2 , y 1 y 2 ) ∈ epi g.
Let (ϕ n ) n∈N ⊂ ]0, π/2[ be a sequence which satisfies lim n→∞ ϕ n = π/2. For any ϕ n , we may find a line in R 2 which goes through z and has slope tan(ϕ n ), which is given by Now ϕ n ∈ ]0, π/2[ and z 2 ≤ log(z 1 ) guarantees that L n ∩ G(S) is a doubleton {x n , y n } where x n 1 < z 1 and y n 1 > z 1 and z = p n x n + q n y n with q n + p n = 1, p n , q n ∈ [0, 1]. The construction of this sequence is shown in Figure 1. As ϕ n → π/2, the slope tan(ϕ n ) of L n goes to infinity, and so we have that lim n→∞ y n = (z 1 , log(z 1 )) and lim n→∞ x n = (0, −∞), and so lim Thus we have that lim n→∞ (z 1 , z 2 , p n x n log(x n ) + q n y n log(y n )) = (z 1 , z 2 , z 1 log(z 1 )) ∈ co(epi g) = epi σ log .
Thus z 1 log(z 1 ) ≥ σ ∂ f (z) for every (z 1 , z 2 ) such that z 1 > 0 and z 2 ≤ log(z 1 ). For the converse inequality, define the function The function w is convex and lsc. It is easy to check that w ≤ g in R 2 . Therefore, Using the fact that w is convex and lsc we deduce that epi w = co(epi w) ⊃ co(epi g) = epi σ log , equivalently, w ≤ σ log . This implies that z 1 log(z 1 ) ≤ σ log (z) for every (z 1 , z 2 ) such that z 1 > 0 and z 2 ≤ log(z 1 ). Consequently, we showed that which is the claim of the theorem.

Corollary 3.4 (The conjugate of the Fitzpatrick function F log ).
We have that Proof. This immediately follows from Theorem 3.3 together with Fact 2.1(iii).

Remark 3.5.
In the proof of Theorem 3.3, the sequences x n , y n may be given explicitly by where W 0 and W −1 are the principal and secondary real branches of the Lambert W function.
Most of the analysis of Lambert W in the context of convex optimization has focused on its principal branch. However, in order to experimentally discover the true form for σ ∂ f from Theorem 3.3, we had to make use of both real branches. The reason for this is that our attempts to explicitly solve the systems were not successful. Seeking to compute σ log numerically, we constructed a numerical procedure which evaluated λx n 1 x n 2 + (1 − λ)y n 1 y n 2 for a finite sequence (ϕ n ) N n=1 and chose the smallest value to represent σ log (z). The fast evaluation of Lambert W obviated the implementation of slower numerical routines to solve the equation system We observed that the smallest value was always the last value, corresponding to ϕ n nearest to π/2. Once we observed that the values were consistently approaching z 1 log(z 1 ) for any z chosen, we "knew" the true form. Upon further scrutiny of the geometry, we realized that the sequences which led to the discovery also yielded the proof.
which is shown in Figure 3c.
Proof. From the definition and Theorem 3.3: which may be recognized as the form in (45) except at the point (0, 0). Taking the closure of the epigraph of D σ log , we obtain D σ log (0, 0) = 0. This example is illustrative, because lower semicontinuity is lost, but only at the point (0, 0), since 0 / ∈ dom log; see Remark 2.5.

A coercivity framework for the Generalized Bregman Distance
In this section, we will establish important properties about the GBD. From now on, X is a reflexive real Banach space. For the sake of simplicity, when we make use of the norms · X⊕X * , · X , and · X * , we allow context to make clear which norm is being used.

Convexity
Proposition 4.1. Let x ∈ dom S and y ∈ dom T. Then the following hold: Proof. We first note that h is convex on X × X * by definition.

Coercivity and Supercoercivity
We now turn our attention to coercivity and supercoercivity. These properties of distances are important, because they are essential to the analysis of associated envelopes and proximity operators. After first providing a framework for verifying these properties of the GBDs, we will show in Section 4.3 how these properties admit corresponding coercivity properties for the sum of the GBDs together with Legendre functions. These results on sums are the key to analysing the envelopes; see [5,Lemma 2.12] and [6].
From now on, as mentioned, we assume our spaces to be reflexive, so that we may make use of the following fact from [14,Remark 3.12].

Fact 4.2 ([14, Remark 3.12]). When X is a reflexive space, it holds that
where d denotes the distance on X × X * defined by d ((x, v), (y, w)) := x − y 2 + v − w 2 . Consequently, we can see D ,h T (x, y) as providing us with an upper estimate of the distance between the sets {x} × Ty and G(S).
Throughout this section, we exploit the fact that the GBD is minorized by the distance between the sets {x} × Ty and G(S) in order to establish left and right coercivity and supercoercivity of the distance. The intuition behind the results is shown in Figure 4.
The following elementary lemma will be useful for our analysis.

Lemma 4.3.
Let (x n ) n∈N , (y n ) n∈N , and (z n ) n∈N be sequences in X such that x n → ∞ as n → ∞. Then the following hold: (i) Suppose that z k n → ∞ whenever (y k n ) n∈N is a subsequence of (y n ) n∈N with y k n → ∞. Then, for all α ∈ R ++ , (ii) Suppose that z k n 2 / y k n → ∞ whenever (y k n ) n∈N is a subsequence of (y n ) n∈N with y k n → ∞. Then x n − y n 2 + z n 2 x n → ∞ as n → ∞.
Proof. (i): Suppose to the contrary that there exist subsequences (x k n ) n∈N , (y k n ) n∈N , and (z k n ) n∈N such that the sequence ( x k n − y k n α + z k n α ) n∈N is bounded. Then both (x k n − y k n ) n∈N and (z k n ) n∈N are bounded. By assumption, passing to another subsequence if necessary, we obtain that the sequence (y k n ) n∈N is also bounded, and so is (x k n ) n∈N since ∀n ∈ N, x k n ≤ x k n − y k n + y k n .
This contradicts the assumption that x n → ∞.
(ii): Suppose that there exist subsequences (x k n ) n∈N , (y k n ) n∈N , (z k n ) n∈N and a constant µ > 0 such that ∀n ∈ N, x k n − y k n 2 + z k n 2 x k n < µ.
Then, by Cauchy-Schwarz inequality, 2 y k n ≥ 2 x k n , y k n x k n = x k n 2 + y k n 2 − x k n − y k n 2 x k n (53a) As n → ∞, since x n → ∞, it follows from (53) that y k n → ∞ and, by assumption, z k n 2 / y k n → ∞. On the other hand, combining (52) with (53) yields z k n 2 y k n < µ x k n y k n ≤ µ 2 y k n + µ y k n = 2µ + µ 2 y k n → 2µ.
A contradiction is thus obtained, and we complete the proof.
T is coercive or supercoercive with respect to the first or second variable, then so is D ,h T . We will thus focus on the coercivity and supercoercivity of D ,h T . (ii) Assume that Ty is compact with respect to the strong topology. Then {x} × Ty is also compact. This, together with Fact 4.2 and the fact that G(S) is closed, allows us to choose v ∈ Ty and (a, b) ∈ G(S) such that Theorem 4.5 (Left coercivity and left supercoercivity of D ,h T ). Let y ∈ dom T. Then (i) If dom S is bounded, then D ,h T (·, y) is supercoercive and hence coercive. Suppose further that Ty is compact with respect to the strong topology. Then the following hold: (ii) If S is coercive in the sense that (a n , b n ) ∈ G(S) and a n → ∞ imply b n → ∞, then D ,h T (·, y) is coercive. (iii) If (a n , b n ) ∈ G(S) and a n → ∞ imply b n 2 / a n → ∞, then D ,h T (·, y) is supercoercive.
Proof. Let (x n ) n∈N satisfy x n → ∞ as n → ∞.
(i): As dom S is bounded, there exists N ∈ N such that for n ≥ N we have x n / ∈ dom S. Fixing an arbitrary n ≥ N, by definition, D ,h T (x n , y) = ∞, so D ,h T (x n , y)/ x n = ∞, and we are done. To prove (ii) and (iii), we derive from Remark 4.4(ii) that, since Ty is compact, there exist v n ∈ Ty and (a n , b n ) ∈ G(S) such that ∀n ∈ N, 4D ,h T (x n , y) ≥ x n − a n 2 + v n − b n 2 .
Here, we note that x n → ∞ as n → ∞ and that (v n ) n∈N is bounded due to compactness of Ty.
(ii): If (a k n ) n∈N is a subsequence of (a n ) n∈N with a k n → ∞, then by assumption (ii), b k n → ∞, which implies that v k n − b k n → ∞. Applying Lemma 4.3(i) to the sequences (x n ) n∈N , (a n ) n∈N , and (v n − b n ) n∈N , we obtain that x n − a n 2 + v n − b n 2 → ∞ and, by (56), D ,h T (x n , y) → ∞ as n → ∞.
(iii): If (a k n ) n∈N is a subsequence of (a n ) n∈N with a k n → ∞, then by assumption (iii), b k n 2 / a k n → ∞, so b k n → ∞ and Now, using Lemma 4.3(ii) yields x n − a n 2 + v n − b n 2 x n → ∞ as n → ∞, which together with (56) completes the proof.
(y n , 0) a n × Sa n (a n , b n ) y n × Ty n (y n , v n ) x × Ty n (x, v n ) |y n − a n | |v n − b n | d(x × Ty n , G(S))  T (x, ·) is supercoercive and hence coercive. Suppose further that T has strongly compact images. Then the following hold: (ii) If (a n , b n ) ∈ G(S), (y n , v n ) ∈ G(T), and y n − a n → ∞ imply v n − b n → ∞, then D ,h T (x, ·) is coercive.
(iii) If (a n , b n ) ∈ G(S), (y n , v n ) ∈ G(T), and y n − a n → ∞ imply v n − b n 2 / y n − a n → ∞, then D ,h T (x, ·) is supercoercive.
Proof. Let (y n ) n∈N satisfy y n → ∞ as n → ∞.
(i): By the boundedness of dom T, there exists N ∈ N such that for n ≥ N, y n / ∈ dom T. Fixing n ≥ N, the definition of D ,h T yields D ,h T (x, y n ) = ∞, which implies that D ,h T (x, y n )/ y n = ∞, and we are done.
(ii) & (iii): Since T has strongly compact images, Ty n is compact with respect to the strong topology. By Remark 4.4(ii), there exist v n ∈ Ty n and (a n , b n ) ∈ G(S) such that As n → ∞, y n − x → ∞ since y n → ∞.
Suppose (ii) holds. We have that if (y k n − a k n ) n∈N is a subsequence of (y n − a n ) n∈N with y k n − a k n → ∞, then v k n − b k n → ∞. Applying Lemma 4.3(i) to the sequences (y n − x) n∈N , (y n − a n ) n∈N , and (v n − b n ) n∈N implies that (y n − x) − (y n − a n ) 2 + v n − b n 2 → ∞, and so D ,h T (x, y n ) → ∞ as n → ∞. Suppose (iii) holds. We derive that if (y k n − a k n ) n∈N is a subsequence of (y n − a n ) n∈N with y k n − a k n → ∞, then v k n − b k n 2 / y k n − a k n → ∞. Now, Lemma 4.3(ii) completes the proof.
As we will see in the following example, the conditions in Theorem 4.5 (resp. Theorem 4.6) are not necessary conditions for the left (resp. right) coercivity or supercoercivity of D ,h T .

Example 4.7.
Suppose that X = R. Let f = Id : R → R, S = ∇ f = 1, and h : R × R → R ∞ given by Let also T = 0. Then Therefore, both D h T (·, y) and D h T (x, ·) are supercoercive and hence coercive (for all x, y ∈ R), while S and T do not satisfy the assumptions in Theorem 4.5 nor in Theorem 4.6. (i) If dom S is bounded, D h (·, y) is supercoercive and hence coercive.
(ii) If S satisfies the property that (a n , b n ) ∈ G(S) and a n → ∞ implies b n → ∞, then D h (·, y) is coercive. (iii) If (a n , b n ) ∈ G(S) and a n → ∞ imply b n 2 / a n → ∞, then D h (·, y) is supercoercive.
Proof. Apply Theorem 4.5 with T = S = ∂ f . Because f ∈ Γ 0 (H), we have that ∂ f is maximally monotone. The compactness of ∂ f (y) comes from the fact that ∂ f is point-to-point. To check, we need only show that f satisfies the criteria for Corollary 4.8.

Corollary 4.10 (Right supercoercivity of
, and x ∈ dom ∂ f . Then the following hold: (i) If ∂ f is bounded, then D h is supercoercive and hence coercive.
(ii) If (a n , b n ), (y n , v n ) ∈ G(∂ f ) and y n − a n → ∞ imply v n − b n → ∞, then D h is coercive.
(iii) If (a n , b n ), (y n , v n ) ∈ G(∂ f ) and y n − a n → ∞ imply v n − b n 2 / y n − a n → ∞, then D h is supercoercive.
Proof. Apply Theorem 4.6 with T = S = ∂ f . Here ∂ f automatically has compact images because it is point-to-point. To check, we need only show that f satisfies the criteria for Corollary 4. 10.
showing the sufficient conditions for Corollary 4.10. This example is illustrated at right in Figure 4. Let x n := n and y n := 0. Then, since ∇ f : x → sign(x)|x| 1/2 , we have that and so d((x n , ∇ f (y n ), G(∇ f )) 2 = x n for all n.
so the sufficient conditions from Theorem 4.5 fail.
The form of D σ log is given in (45). If x or y is less than or equal zero, D σ log (x, y) = ∞. Fixing y > 0, we have that for x > 1: and so D σ log is left supercoercive.

Coercivity of the sum of D and a convex function
The following propositions and their accompanying proofs extend and follow the template of Bauschke, Combettes, and Noll in [5,Lemma 2.12], with modifications necessary in order to handle the greater generality of D ,h T . In the following, X is assumed to be a real Hilbert space, U S := int dom S, and U T := int dom T.

Proposition 4.14 (Left coercivity of the sum of D ,h
T and a convex function). Let θ ∈ Γ 0 (X) be such that U S ∩ dom θ = ∅ and let γ ∈ R ++ . Suppose that one of the following holds: (a) U S ∩ dom θ is bounded and for all y ∈ U T , D ,h T (·, y) is coercive. (b) inf θ(U S ) > −∞ and for all y ∈ U T , D ,h T (·, y) is coercive. (c) For all y ∈ U T , D ,h T (·, y) is supercoercive. Then ∀y ∈ U T , θ(·) + 1 γ D ,h T (·, y) is coercive.

Conclusion
In Section 2, we illuminated the similarities between Bregman distances and the new GBDs, explaining the domain conditions under which they are equal when the Fenchel-Young representative is employed. We also introduced the lower closed GBD, a variant whose advantages we motivated in Sections 3 and 4.
In Section 3, we provided detailed examples of how to compute the new GBDs, illustrating with the energy and the Boltzmann-Shannon entropy, whose Bregman distances respectively correspond to the classical Moreau case and the Kullback-Liebler divergence. We compared the Fenchel-Young representative case with the two cases of the Fitzpatrick representative and its conjugate. These are the two other most natural representative functions to consider, because they serve as book-ends for the representative set H(S), as motivated in Section 2.
In Section 3.2 we answered the open question of finding the conjugate for the Fitzpatrick function of the logarithm. In so-doing, we demonstrated how to use the graphical characterizations of representative functions in order to compute GBDs, and we illustrated the role that special functions like Lambert W play in computational discovery. The method of computational discovery that we used is prototypical of what one might employ in similar situations where the symbolic computation poses a challenge. Section 4 contains the most important theoretical contribution of this work: a framework for verifying the coercivity and supercoercivity of the left and right distances, as well as the coercivity of the sum of these distances together with a Legendre function. We have also illustrated how this framework for sufficiency possesses a useful geometric interpretation, because the GBDs provide an upper estimate on a set distance. In our examples, we illustrated what might go wrong when sufficient criteria do not hold. These coercivity properties are important, because of the role they play in establishing asymptotic properties for envelopes and proximity operators in the classical Bregman case, and also in establishing existence of minimizers of regularized problems; see, for example, [6,12,15,16,18]. Such properties are important, because many optimization algorithms may be viewed as special cases of gradient descent applied to envelope functions.

Future Work
The coercivity framework we have established makes possible several new avenues of inquiry. While the conditions we provide for verifying coercivity and supercoercivity in Section 4 are sufficient, they are not always necessary. An important future work is to catalogue useful (computable) distances for which the coercivity results hold. In particular, by establishing the aforementioned coercivity framework, we have set the table for a study of the left and right envelopes, along with their corresponding proximity operators. A much more interesting question is whether certain optimization algorithms might be viewed as gradient descent applied to GBD envelopes other than already-known Fenchel-Young cases. Another natural question is: what do the dual characterizations of such algorithms look like?