Musical Isomorphisms and Pseudo-Riemannian Metrics

The Tangent-Cotangent Isomorphism

A vector space and its dual are isomorphic once their dimensions agree, but the isomorphism is not canonical: it depends on a choice of basis. An inner product removes that ambiguity, pairing each vector with the covector that measures inner products against it. A Riemannian metric does this at every tangent space at once, and does so smoothly, producing a bundle isomorphism between the tangent and cotangent bundles that belongs to the manifold itself rather than to any chart.

Let \((M, g)\) be a Riemannian manifold with or without boundary. For each \(p \in M\) and each \(v \in T_pM\), the assignment \(w \mapsto g_p(v, w)\) is a linear functional on \(T_pM\), hence an element of the cotangent space \(T_p^*M\). Collecting these defines a map between bundles.

Definition: The Tangent-Cotangent Map

Let \((M, g)\) be a Riemannian manifold. The tangent-cotangent map is the bundle homomorphism \(\hat{g} : TM \to T^*M\) over \(M\) that assigns to each \(v \in T_pM\) the covector \(\hat{g}(v) \in T_p^*M\) defined by \[ \hat{g}(v)(w) = g_p(v, w) \qquad \text{for all } w \in T_pM. \]

That \(\hat{g}\) is a smooth bundle homomorphism is seen most easily through its action on vector fields. For \(X, Y \in \mathfrak{X}(M)\), define \(\hat{g}(X)\) by \[ \hat{g}(X)(Y) = g(X, Y), \] a smooth function on \(M\). As a function of \(Y\) this is linear over \(C^\infty(M)\), so by the tensor characterization lemma \(\hat{g}(X)\) is a smooth covector field; as a function of \(X\) it too is linear over \(C^\infty(M)\), so \(\hat{g}\) is a smooth bundle homomorphism, inducing a linear map on sections \(\hat{g} : \mathfrak{X}(M) \to \mathfrak{X}^*(M)\). The same symbol denotes both the pointwise map and the map on sections.

The property that makes the construction useful is that this homomorphism is an isomorphism, so that vectors and covectors are interchangeable once a metric is fixed.

Theorem: The Tangent-Cotangent Isomorphism

For any Riemannian manifold \((M, g)\), the tangent-cotangent map \(\hat{g} : TM \to T^*M\) is a smooth bundle isomorphism.

Proof:

At each point \(p\), the map \(\hat{g} : T_pM \to T_p^*M\) is injective: if \(\hat{g}(v) = 0\), then \(g_p(v, w) = 0\) for all \(w \in T_pM\), and taking \(w = v\) gives \(g_p(v, v) = 0\), which forces \(v = 0\) by positive definiteness of the metric. Since \(T_pM\) and \(T_p^*M\) have the same finite dimension, an injective linear map between them is bijective. Thus \(\hat{g}\) is a bijective smooth bundle homomorphism, and a bijective smooth bundle homomorphism is an isomorphism, so \(\hat{g}\) is a smooth bundle isomorphism with smooth inverse \(\hat{g}^{-1} : T^*M \to TM\).

Lowering and Raising Indices

The isomorphism and its inverse carry names borrowed from music. Sending a vector to its associated covector is written with a flat sign and called lowering an index; the inverse, sending a covector to its vector, is written with a sharp sign and called raising an index. The same pair of operations appears whenever an inner product identifies a space with its dual, on a single inner product space through the Riesz correspondence; here it is carried out at every tangent space at once.

Definition: The Flat Map (Lowering an Index)

Let \((M, g)\) be a Riemannian manifold. The flat map is the bundle isomorphism \(\flat = \hat{g} : TM \to T^*M\). For a vector field \(X \in \mathfrak{X}(M)\), its image is written \(X^\flat = \hat{g}(X)\), a covector field, said to be obtained from \(X\) by lowering an index.

In smooth coordinates \((x^i)\) the metric is \(g = g_{ij}\, dx^i\, dx^j\), so for \(X = X^i\, \partial/\partial x^i\) the defining relation \(X^\flat(Y) = g(X, Y)\) gives, on the coordinate vector field \(\partial/\partial x^j\), \[ X^\flat\!\left( \frac{\partial}{\partial x^j} \right) = g\!\left( X^i \frac{\partial}{\partial x^i},\, \frac{\partial}{\partial x^j} \right) = g_{ij}\, X^i. \] Writing \(X^\flat = X_j\, dx^j\), this reads \(X_j = g_{ij}\, X^i\): the components of the covector are obtained from those of the vector by contracting with the metric, lowering the upper index to a lower one. The matrix of the flat map in coordinate frames is exactly the matrix \((g_{ij})\) of the metric.

The inverse operation requires the inverse of that matrix. At each point the matrix \((g_{ij})\) is positive definite, hence invertible; denote its inverse by \((g^{ij})\), the matrix-valued function characterized pointwise by \[ g^{ij}\, g_{jk} = \delta^i_k. \] Since \((g_{ij})\) is symmetric, so is \((g^{ij})\).

Definition: The Sharp Map (Raising an Index)

Let \((M, g)\) be a Riemannian manifold. The sharp map is the inverse bundle isomorphism \(\sharp = \hat{g}^{-1} : T^*M \to TM\). For a covector field \(\omega \in \mathfrak{X}^*(M)\), its image is written \(\omega^\sharp = \hat{g}^{-1}(\omega)\), a vector field, said to be obtained from \(\omega\) by raising an index. In coordinates, for \(\omega = \omega_j\, dx^j\), \[ \omega^\sharp = \omega^i\, \frac{\partial}{\partial x^i} , \qquad \omega^i = g^{ij}\, \omega_j, \] the components of the vector obtained by contracting those of the covector with the inverse metric.

On a single inner product space the same inverse operation recovers a vector from a covector through the Riesz representation; the coordinate formula \(\omega^i = g^{ij}\,\omega_j\) is what that recovery becomes once a basis is fixed and the metric is no longer the identity matrix. A mnemonic keeps the two straight: the value of \(\omega^\sharp\) is a vector, visualized as an arrow, while the value of \(X^\flat\) is a covector, visualized through its level sets.

The Gradient

The sharp map recovers the gradient as a vector field. In elementary calculus the gradient of a function is a vector, but the natural derivative of a function on a manifold is its differential, a covector. Raising the index of the differential converts it back into a vector field, and that vector field is the gradient. The construction depends on the metric, which is why the gradient is a Riemannian notion while the differential is not.

Definition: The Gradient

Let \((M, g)\) be a Riemannian manifold and \(f \in C^\infty(M)\). The gradient of \(f\) is the vector field obtained by raising the index of the differential, \[ \operatorname{grad} f = (df)^\sharp = \hat{g}^{-1}(df). \]

Unraveling the definition gives the property that characterizes the gradient and matches the elementary one. The same construction on a single inner product space defines the gradient through the Riesz correspondence; here it is performed pointwise by the metric.

Theorem: Characterization of the Gradient

For \(f \in C^\infty(M)\), the gradient \(\operatorname{grad} f\) is the unique vector field satisfying \[ \langle \operatorname{grad} f,\, X \rangle_g = df(X) = Xf \qquad \text{for every } X \in \mathfrak{X}(M). \]

Proof:

Apply \(\hat{g}\) to \(\operatorname{grad} f = \hat{g}^{-1}(df)\): since \(\hat{g}\) and \(\hat{g}^{-1}\) are inverse, \(\hat{g}(\operatorname{grad} f) = df\). Evaluating both sides on a vector field \(X\) and using the definition \(\hat{g}(V)(X) = \langle V, X \rangle_g\) gives \(\langle \operatorname{grad} f, X \rangle_g = df(X)\), and \(df(X) = Xf\) by the definition of the differential. Uniqueness follows because \(\hat{g}\) is an isomorphism: any vector field \(V\) with \(\langle V, X \rangle_g = df(X)\) for all \(X\) satisfies \(\hat{g}(V) = df\), hence \(V = \hat{g}^{-1}(df) = \operatorname{grad} f\).

In coordinates the gradient is the index-raised differential. Since \(df = (\partial f / \partial x^i)\, dx^i\), raising the index gives \[ \operatorname{grad} f = g^{ij}\, \frac{\partial f}{\partial x^i}\, \frac{\partial}{\partial x^j}. \] This expression is smooth, so the gradient is a smooth vector field. On \(\mathbb{R}^n\) with the Euclidean metric, where \(g^{ij} = \delta^{ij}\), it reduces to the familiar \(\operatorname{grad} f = \sum_i (\partial f / \partial x^i)\, \partial/\partial x^i\); in other coordinates the form is different, because the inverse metric is no longer the identity.

Example (Gradient in Polar Coordinates):

On \(\mathbb{R}^2\) the Euclidean metric in polar coordinates is \(dr^2 + r^2\, d\theta^2\), with matrix \(\operatorname{diag}(1, r^2)\) and inverse \(\operatorname{diag}(1, r^{-2})\). For \(f \in C^\infty(\mathbb{R}^2)\), inserting \((g^{ij}) = \operatorname{diag}(1, r^{-2})\) into the coordinate formula yields \[ \operatorname{grad} f = \frac{\partial f}{\partial r}\, \frac{\partial}{\partial r} + \frac{1}{r^2}\, \frac{\partial f}{\partial \theta}\, \frac{\partial}{\partial \theta}. \] The angular term carries the factor \(r^{-2}\), absent in the Cartesian expression, reflecting that a unit change in \(\theta\) corresponds to a larger displacement the farther one is from the origin.

The characterizing identity says that the gradient is the direction in which the directional derivative \(Xf\) is reproduced by the metric inner product. The direction of \(\operatorname{grad} f\) is the one in which \(f\) increases fastest, it is orthogonal to the level sets of \(f\), and its length is the maximum directional derivative of \(f\); these are the same geometric facts the gradient has in Euclidean space, now read off through the metric \(g\).

Steepest Descent and the Choice of Metric

The coordinate formula \(\operatorname{grad} f = g^{ij}\,\partial_i f\,\partial_j\) shows that the gradient is not the bare tuple of partial derivatives but that tuple transformed by the inverse metric. A learning system that adjusts parameters along the direction of fastest decrease of a loss is descending along a gradient, and which direction that is depends on the metric placed on the parameter space. Using the identity matrix recovers ordinary steepest descent; using a metric adapted to the geometry of the problem yields a different and often more stable direction. When the parameter space is a space of probability distributions, a natural choice of metric is the Fisher information matrix, built from the distributions themselves, and the corresponding gradient is the natural gradient \(F(\theta)^{-1}\nabla\mathcal{L}\), in which the inverse Fisher matrix plays exactly the role of \((g^{ij})\) above. It gives the direction of steepest descent measured in the intrinsic geometry of the model rather than in the arbitrary coordinates of the parameters, and the resulting natural gradient descent is developed as a method in its own right elsewhere. This is the entry point to information geometry, where the metric on a statistical model and the gradients it induces become the principal objects of study.

Pseudo-Riemannian Metrics

Everything built so far rests on one property of the metric: that the tangent-cotangent map it induces is an isomorphism. Positive definiteness was used to prove that, but it is more than is needed. Relaxing positivity while keeping the isomorphism yields a broader class of metrics, the setting for the geometry underlying relativity, in which lengths may be negative or zero yet the musical correspondence between vectors and covectors survives intact.

The property that the tangent-cotangent map is an isomorphism has a name at the level of a single vector space. A symmetric \(2\)-tensor \(g\) on a vector space \(V\) is called nondegenerate if the linear map \(\hat{g} : V \to V^*\) defined by \(\hat{g}(v)(w) = g(v, w)\) is an isomorphism; equivalently, if for every nonzero \(v \in V\) there is some \(w \in V\) with \(g(v, w) \ne 0\). A positive definite \(g\) is nondegenerate, since \(g(v, v) > 0\) already supplies such a \(w = v\); but nondegeneracy is weaker, requiring only that no nonzero vector be orthogonal to everything.

Just as an inner product can be diagonalized to the identity by passing to an orthonormal basis, any nondegenerate symmetric \(2\)-tensor can be brought by a change of basis to a diagonal form with every entry equal to \(+1\) or \(-1\). The number \(r\) of positive entries and the number \(s\) of negative entries do not depend on the basis chosen; this invariance is the content of a classical theorem on symmetric bilinear forms, which we take as known. The resulting pair \((r, s)\) is the signature of the tensor, an invariant attached to it.

Definition: Pseudo-Riemannian and Lorentz Metrics

A pseudo-Riemannian metric on a smooth manifold \(M\) is a smooth symmetric \(2\)-tensor field whose value at each point is nondegenerate, with the same signature at every point. A pseudo-Riemannian metric of signature \((n - 1, 1)\) or \((1, n - 1)\), depending on convention, is called a Lorentz metric; metrics of this kind model spacetime in the general theory of relativity, where the single distinguished sign corresponds to the time direction.

A Riemannian metric is the special case of signature \((n, 0)\). For a pseudo-Riemannian metric the entire tangent-cotangent apparatus continues to function: nondegeneracy is exactly the condition that \(\hat{g}\) be a bundle isomorphism, so the flat and sharp maps, the lowering and raising of indices, and the gradient are all defined verbatim, with the inverse matrix \((g^{ij})\) now having entries of either sign. What is lost is the notion of length: \(g(v, v)\) may be positive, negative, or zero for nonzero \(v\), so it no longer defines a norm, and the distance theory of the previous development has no direct counterpart.

One feature does not survive the generalization. The existence of a Riemannian metric on every smooth manifold was proved by patching local metrics together with a partition of unity, an argument that relied on a positive combination of positive definite tensors remaining positive definite. For pseudo-Riemannian metrics this fails, since a combination of nondegenerate tensors need not be nondegenerate, and indeed not every manifold admits a Lorentz metric. The passage from Riemannian to pseudo-Riemannian geometry thus trades the guarantee of existence for the reach into spaces where time and space carry opposite signs.