The Tangent-Cotangent Isomorphism
A vector space and its dual are isomorphic once their dimensions agree, but the isomorphism is not
canonical: it depends on a choice of basis. An inner product removes that ambiguity, pairing each
vector with the covector that measures inner products against it. A Riemannian metric does this at
every tangent space at once, and does so smoothly, producing a bundle isomorphism between the
tangent and cotangent bundles that belongs to the manifold itself rather than to any chart.
Let \((M, g)\) be a Riemannian manifold with or without boundary. For each \(p \in M\) and each \(v
\in T_pM\), the assignment \(w \mapsto g_p(v, w)\) is a linear functional on \(T_pM\), hence an
element of the
cotangent space
\(T_p^*M\). Collecting these defines a map between bundles.
Definition: The Tangent-Cotangent Map
Let \((M, g)\) be a Riemannian manifold. The tangent-cotangent map is the
bundle homomorphism \(\hat{g} : TM \to T^*M\) over \(M\) that assigns to each \(v \in T_pM\) the
covector \(\hat{g}(v) \in T_p^*M\) defined by
\[
\hat{g}(v)(w) = g_p(v, w) \qquad \text{for all } w \in T_pM .
\]
That \(\hat{g}\) is a smooth
bundle homomorphism
is seen most easily through its action on vector fields. For \(X, Y \in \mathfrak{X}(M)\), define
\(\hat{g}(X)\) by
\[
\hat{g}(X)(Y) = g(X, Y) ,
\]
a smooth function on \(M\). As a function of \(Y\) this is linear over \(C^\infty(M)\), so by the
tensor characterization lemma
\(\hat{g}(X)\) is a smooth covector field; as a function of \(X\) it too is linear over
\(C^\infty(M)\), so \(\hat{g}\) is a smooth bundle homomorphism, inducing a linear map on sections
\(\hat{g} : \mathfrak{X}(M) \to \mathfrak{X}^*(M)\). The same symbol denotes both the pointwise map
and the map on sections.
The property that makes the construction useful is that this homomorphism is an isomorphism, so that vectors and covectors are
interchangeable once a metric is fixed.
Theorem: The Tangent-Cotangent Isomorphism
For any Riemannian manifold \((M, g)\), the tangent-cotangent map \(\hat{g} : TM \to T^*M\) is a
smooth bundle isomorphism.
Proof:
At each point \(p\), the map \(\hat{g} : T_pM \to T_p^*M\) is injective: if \(\hat{g}(v) = 0\),
then \(g_p(v, w) = 0\) for all \(w \in T_pM\), and taking \(w = v\) gives \(g_p(v, v) = 0\),
which forces \(v = 0\) by positive definiteness of the metric. Since \(T_pM\) and \(T_p^*M\) have
the same finite dimension, an injective linear map between them is bijective. Thus \(\hat{g}\) is
a bijective smooth bundle homomorphism, and a
bijective smooth bundle homomorphism is an isomorphism,
so \(\hat{g}\) is a smooth bundle isomorphism with smooth inverse \(\hat{g}^{-1} : T^*M \to TM\).
Lowering and Raising Indices
The isomorphism and its inverse carry names borrowed from music. Sending a vector to its
associated covector is written with a flat sign and called lowering an index; the inverse, sending a
covector to its vector, is written with a sharp sign and called raising an index. The same pair of
operations appears whenever an inner product identifies a space with its dual, on a single
inner product space
through the Riesz correspondence; here it is carried out at every tangent space at once.
Definition: The Flat Map (Lowering an Index)
Let \((M, g)\) be a Riemannian manifold. The flat map is the bundle
isomorphism \(\flat = \hat{g} : TM \to T^*M\). For a vector field \(X \in \mathfrak{X}(M)\), its
image is written \(X^\flat = \hat{g}(X)\), a covector field, said to be obtained from \(X\) by
lowering an index.
In smooth coordinates \((x^i)\) the metric is \(g = g_{ij}\, dx^i\, dx^j\), so for \(X = X^i\,
\partial/\partial x^i\) the defining relation \(X^\flat(Y) = g(X, Y)\) gives, on the coordinate
vector field \(\partial/\partial x^j\),
\[
X^\flat\!\left( \frac{\partial}{\partial x^j} \right) = g\!\left( X^i \frac{\partial}{\partial x^i},\, \frac{\partial}{\partial x^j} \right) = g_{ij}\, X^i .
\]
Writing \(X^\flat = X_j\, dx^j\), this reads \(X_j = g_{ij}\, X^i\): the components of the covector
are obtained from those of the vector by contracting with the metric, lowering the upper index to a
lower one. The matrix of the flat map in coordinate frames is exactly the matrix \((g_{ij})\) of the
metric.
The inverse operation requires the inverse of that matrix. At each point the matrix \((g_{ij})\) is
positive definite, hence invertible; denote its inverse by \((g^{ij})\), the matrix-valued function
characterized pointwise by
\[
g^{ij}\, g_{jk} = \delta^i_k .
\]
Since \((g_{ij})\) is symmetric, so is \((g^{ij})\).
Definition: The Sharp Map (Raising an Index)
Let \((M, g)\) be a Riemannian manifold. The sharp map is the inverse bundle
isomorphism \(\sharp = \hat{g}^{-1} : T^*M \to TM\). For a covector field \(\omega \in
\mathfrak{X}^*(M)\), its image is written \(\omega^\sharp = \hat{g}^{-1}(\omega)\), a vector
field, said to be obtained from \(\omega\) by raising an index. In coordinates,
for \(\omega = \omega_j\, dx^j\),
\[
\omega^\sharp = \omega^i\, \frac{\partial}{\partial x^i} , \qquad \omega^i = g^{ij}\, \omega_j ,
\]
the components of the vector obtained by contracting those of the covector with the inverse
metric.
On a single inner product space the same inverse operation
recovers a vector from a covector
through the Riesz representation; the coordinate formula \(\omega^i = g^{ij}\,\omega_j\) is what that
recovery becomes once a basis is fixed and the metric is no longer the identity matrix. A mnemonic
keeps the two straight: the value of \(\omega^\sharp\) is a vector, visualized as an arrow, while
the value of \(X^\flat\) is a covector, visualized through its level sets.
The Gradient
The sharp map recovers the gradient as a vector field. In elementary calculus the gradient
of a function is a vector, but the natural derivative of a function on a manifold is its
differential, a covector. Raising the index of the differential converts it back into a vector
field, and that vector field is the gradient. The construction depends on the metric, which is why
the gradient is a Riemannian notion while the differential is not.
Definition: The Gradient
Let \((M, g)\) be a Riemannian manifold and \(f \in C^\infty(M)\). The gradient
of \(f\) is the vector field obtained by raising the index of the
differential,
\[
\operatorname{grad} f = (df)^\sharp = \hat{g}^{-1}(df) .
\]
Unraveling the definition gives the property that characterizes the gradient and matches the
elementary one. The same construction on a single
inner product space
defines the gradient through the Riesz correspondence; here it is performed pointwise by the metric.
Theorem: Characterization of the Gradient
For \(f \in C^\infty(M)\), the gradient \(\operatorname{grad} f\) is the unique vector field
satisfying
\[
\langle \operatorname{grad} f,\, X \rangle_g = df(X) = Xf \qquad \text{for every } X \in \mathfrak{X}(M) .
\]
Proof:
Apply \(\hat{g}\) to \(\operatorname{grad} f = \hat{g}^{-1}(df)\): since \(\hat{g}\) and
\(\hat{g}^{-1}\) are inverse, \(\hat{g}(\operatorname{grad} f) = df\). Evaluating both sides on a
vector field \(X\) and using the definition \(\hat{g}(V)(X) = \langle V, X \rangle_g\) gives
\(\langle \operatorname{grad} f, X \rangle_g = df(X)\), and \(df(X) = Xf\) by the definition of
the differential. Uniqueness follows because \(\hat{g}\) is an isomorphism: any vector field
\(V\) with \(\langle V, X \rangle_g = df(X)\) for all \(X\) satisfies \(\hat{g}(V) = df\), hence
\(V = \hat{g}^{-1}(df) = \operatorname{grad} f\).
In coordinates the gradient is the index-raised differential. Since \(df = (\partial f / \partial
x^i)\, dx^i\), raising the index gives
\[
\operatorname{grad} f = g^{ij}\, \frac{\partial f}{\partial x^i}\, \frac{\partial}{\partial x^j} .
\]
This expression is smooth, so the gradient is a smooth vector field. On \(\mathbb{R}^n\) with the
Euclidean metric, where \(g^{ij} = \delta^{ij}\), it reduces to the familiar \(\operatorname{grad} f
= \sum_i (\partial f / \partial x^i)\, \partial/\partial x^i\); in other coordinates the form is
different, because the inverse metric is no longer the identity.
Example (Gradient in Polar Coordinates):
On \(\mathbb{R}^2\) the Euclidean metric in polar coordinates is \(dr^2 + r^2\, d\theta^2\), with
matrix \(\operatorname{diag}(1, r^2)\) and inverse \(\operatorname{diag}(1, r^{-2})\). For \(f
\in C^\infty(\mathbb{R}^2)\), inserting \((g^{ij}) = \operatorname{diag}(1, r^{-2})\) into the
coordinate formula yields
\[
\operatorname{grad} f = \frac{\partial f}{\partial r}\, \frac{\partial}{\partial r} + \frac{1}{r^2}\, \frac{\partial f}{\partial \theta}\, \frac{\partial}{\partial \theta} .
\]
The angular term carries the factor \(r^{-2}\), absent in the Cartesian expression, reflecting
that a unit change in \(\theta\) corresponds to a larger displacement the farther one is from
the origin.
The characterizing identity says that the gradient is the direction in which the directional
derivative \(Xf\) is reproduced by the metric inner product. The direction of \(\operatorname{grad}
f\) is the one in which \(f\) increases fastest, it is orthogonal to the level sets of \(f\), and
its length is the maximum directional derivative of \(f\); these are the same geometric facts the
gradient has in Euclidean space, now read off through the metric \(g\).
Steepest Descent and the Choice of Metric
The coordinate formula \(\operatorname{grad} f = g^{ij}\,\partial_i f\,\partial_j\) shows that
the gradient is not the bare tuple of partial derivatives but that tuple transformed by the
inverse metric. A learning system that adjusts parameters along the direction of fastest
decrease of a loss is descending along a gradient, and which direction that is depends on the
metric placed on the parameter space. Using the identity matrix recovers ordinary steepest
descent; using a metric adapted to the geometry of the problem yields a different and often
more stable direction. When the parameter space is a space of probability distributions, a
natural choice of metric is the
Fisher information matrix,
built from the distributions themselves, and the corresponding gradient is the
natural gradient
\(F(\theta)^{-1}\nabla\mathcal{L}\), in which the inverse Fisher matrix plays exactly the role of
\((g^{ij})\) above. It gives the direction of steepest descent measured in the intrinsic
geometry of the model rather than in the arbitrary coordinates of the parameters, and the
resulting natural gradient descent is developed
as a method in its own right elsewhere. This is the
entry point to information geometry, where the metric on a statistical model and the gradients it
induces become the principal objects of study.
Pseudo-Riemannian Metrics
Everything built so far rests on one property of the metric: that the tangent-cotangent map it
induces is an isomorphism. Positive definiteness was used to prove that, but it is more than is
needed. Relaxing positivity while keeping the isomorphism yields a broader class of metrics, the
setting for the geometry underlying relativity, in which lengths may be negative or zero yet the musical
correspondence between vectors and covectors survives intact.
The property that the tangent-cotangent map is an isomorphism has a name at the level of a single
vector space. A symmetric \(2\)-tensor \(g\) on a vector space \(V\) is called
nondegenerate if the linear map \(\hat{g} : V \to V^*\) defined by \(\hat{g}(v)(w)
= g(v, w)\) is an isomorphism; equivalently, if for every nonzero \(v \in V\) there is some \(w \in
V\) with \(g(v, w) \ne 0\). A positive definite \(g\) is nondegenerate, since \(g(v, v) > 0\)
already supplies such a \(w = v\); but nondegeneracy is weaker, requiring only that no nonzero
vector be orthogonal to everything.
Just as an inner product can be diagonalized to the identity by passing to an orthonormal basis,
any nondegenerate symmetric \(2\)-tensor can be brought by a change of basis to a diagonal form with
every entry equal to \(+1\) or \(-1\). The number \(r\) of positive entries and the number \(s\) of
negative entries do not depend on the basis chosen; this invariance is the content of a classical
theorem on symmetric bilinear forms, which we take as known. The resulting pair \((r, s)\) is the
signature of the tensor, an invariant attached to it.
Definition: Pseudo-Riemannian and Lorentz Metrics
A pseudo-Riemannian metric on a smooth manifold \(M\) is a smooth symmetric
\(2\)-tensor field whose value at each point is nondegenerate, with the same signature at every
point. A pseudo-Riemannian metric of signature \((n - 1, 1)\) or \((1, n - 1)\), depending on
convention, is called a Lorentz metric; metrics of this kind model spacetime
in the general theory of relativity, where the single distinguished sign corresponds to the
time direction.
A Riemannian metric is the special case of signature \((n, 0)\). For a pseudo-Riemannian metric the
entire tangent-cotangent apparatus continues to function: nondegeneracy is exactly the condition
that \(\hat{g}\) be a bundle isomorphism, so the flat and sharp maps, the lowering and raising of
indices, and the gradient are all defined verbatim, with the inverse matrix \((g^{ij})\) now having
entries of either sign. What is lost is the notion of length: \(g(v, v)\) may be positive, negative,
or zero for nonzero \(v\), so it no longer defines a norm, and the distance theory of the previous
development has no direct counterpart.
One feature does not survive the generalization. The existence of a Riemannian metric on every
smooth manifold was proved by patching local metrics together with a partition of unity, an
argument that relied on a positive combination of positive definite tensors remaining positive
definite. For pseudo-Riemannian metrics this fails, since a combination of nondegenerate tensors
need not be nondegenerate, and indeed not every manifold admits a Lorentz metric. The passage from
Riemannian to pseudo-Riemannian geometry thus trades the guarantee of existence for the reach into
spaces where time and space carry opposite signs.