Proposal for gradient wrt complex variables¶
This is a proposal to handle gradients of a scalar, real variable (usually, a cost) with respect to tensor variables, of complex (and real) type, in an optimization perspective.
Derivative of complex variables is usually studied only for so-called analytical complex functions, which have a particular structure in their partial derivatives. However, we do not want to limit ourselves to analytical functions, and we make other assumptions (that the final cost is real-valued, for instance), so we will adopt a different convention for gradients than what is usually used in the literature.
Gradient (re-)definition¶
We are interested in the case where we have a final real-valued
cost, , and a graph of mathematical expressions, including
real-valued and complex-valued variables (scalars, vectors, matrices,
higher-order tensors), and we want to compute the gradient of
,
wrt some variables in that graph, using gradient back-propagation.
In the case where some variables are complex, the usual chain rule
cannot be applied, except in some cases.
For each real-valued variable (not necessarily scalar,
it could be a matrix, for instance), in particular
and
, partial derivatives can be defined:
has the same number of dimensions
and shape as
. We will limit that notation to real-valued
variables only, this way, the partial derivative itself will be
real-valued too. We will not use that notation for the complex
derivative of analytical complex functions.
For any real-valued intermediate variable , the usual chain
rule applies:
If is a complex variable, with
and
, we can consider
and
as free
variables, and then:
If we want to use an algorithm similar to gradient backpropagation,
we can see that, here, we need to have both and
, in order
to compute
.
For each variable in the expression graph, let us denote
the gradient of
with respect to
. It is a tensor with the same dimensions as
, and can
be complex-valued. We define:
This is the tensor that we are going to back-propagate through the computation graph.
Generalized chain rule¶
Using the definition above, if we have two complex variables and
(with
all real-valued):
This formula can be used whether or not is an analytical
function of
or
, and whether or not
is an
analytical function of
.
Special cases¶
Real-valued input variable¶
If variable is defined as real-valued, it can sometimes
be useful to have the value of
instead of only
, because the imaginary part
contains information on how the cost would change if
was not
constrained to be 0.
Real-valued intermediate variable¶
When is an intermediate variable, however, the gradient of
wrt
must not be backpropagated through
.
Therefore, we have:
The imaginary part of is ignored, because
is constrained to be 0.
Analytic functions¶
If is the output of an analytic function of
, some
simplifications are possible. Analytic functions include, for instance,
polynomial functions, the exponential function. Most complex functions,
however, are not: absolute value, real part, imaginary part, complex
conjugate, etc.
Analytic (or holomorphic) functions satisfy the Cauchy-Riemann equations:
Or, in our case:
This leads to:
Finite differences¶
In order to verify that the mathematical formula for a gradient, or its
implementation, is correct, we usually use a finite-differenciation
approach. If is our real scalar cost, and
a
real-valued scalar variable, then:
where is also a real scalar, of small magnitude
(typically
to
). If
is a
tensor, then this approximation has to be made for each element
independently (a different
could be used
each time, but usually they are all equal to
).
For a complex scalar variable :
Both partial derivative have to be estimated independently, using
generally .