Lectures on Faster-than-Light Travel and Time Travel

Prof. Barak Shoshany


Published in
SciPost Phys. Lect. Notes 10 (2019)
DOI: 10.21468/SciPostPhysLectNotes.10


These lecture notes were prepared for a 25-hour course for advanced undergraduate students participating in Perimeter Institute’s Undergraduate Summer Program. The lectures cover some of what is currently known about the possibility of superluminal travel and time travel within the context of established science, that is, general relativity and quantum field theory. Previous knowledge of general relativity at the level of a standard undergraduate-level introductory course is recommended, but all the relevant material is included for completion and reference. No previous knowledge of quantum field theory, or anything else beyond the standard undergraduate curriculum, is required. Advanced topics in relativity, such as causal structures, the Raychaudhuri equation, and the energy conditions are presented in detail. Once the required background is covered, concepts related to faster-than-light travel and time travel are discussed. After introducing tachyons in special relativity as a warm-up, exotic spacetime geometries in general relativity such as warp drives and wormholes are discussed and analyzed, including their limitations. Time travel paradoxes are also discussed in detail, including some of their proposed resolutions.

1 Introduction

“Space is big. Really big. You just won’t believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it’s a long way down the road to the chemist, but that’s just peanuts to space.”
– Douglas Adams, The Hitchhiker’s Guide to the Galaxy

In science fiction, whenever the plot encompasses more than one solar system, faster-than-light travel is an almost unavoidable necessity. In reality, however, it seems that space travel is limited by the speed of light. In fact, even just accelerating to a significant fraction of the speed of light is already a hard problem by itself, e.g. due to the huge energy requirements and the danger of high-speed collisions with interstellar dust. However, this is a problem of engineering, not physics – and we may assume that, given sufficiently advanced technology, it could eventually be solved.

Unfortunately, even once we are able to build and power a spaceship that can travel at close to the speed of light, the problem still remains that interstellar distances are measured in light years – and therefore will take years to complete, no matter how close we can get to the speed of light. The closest star system, Alpha Centauri, is roughly 4.4 light years away, and thus the trip would take at least 4.4 years to complete. Other star systems with potentially habitable exoplanets are located tens, hundreds, or even thousands of light years away; the diameter of the Milky Way galaxy is estimated at 175 ± 25 thousand light years 1 1 1 And, for intergalactic travel, the Andromeda galaxy, for example, is 2.54 ± 0.11 millions of light years away. But let’s solve one problem at a time… .

For a one-way trip, the long time it takes to reach the destination may not be an insurmountable obstacle. First of all, humanity is already planning to send people to Mars, a journey which is estimated to take around 9 months. Thus it is not inconceivable to send people on a journey which will take several years, especially if technological advances make the trip more tolerable.

Second, relativistic time dilation means that, while for an observer on Earth it would seem that the spaceship takes at least 4.4 years to complete the trip to Alpha Centauri, an observer on the spaceship will measure an arbitrarily short proper time on their own clock, which will get shorter the closer they get to the speed of light 2 2 2 Recall that the time dilation factor is given (in units where c 1 ) by γ = 1 / 1 - v 2 , which is equal to 1 for v = 0 but approaches infinity in the limit v 1 , that is, as the velocity approaches the speed of light. .

Furthermore, both scientists and science fiction authors have even imagined journeys lasting hundreds or even thousands of years, while the passengers are in suspended animation in order for them to survive the long journey. Others have envisioned generation ships, where only the distant descendants of the original passengers actually make it to the destination 3 3 3 Another possibility is that, by the time humanity develops interstellar travel, humans will also have much longer lifespans and/or will be able to inhabit artificial bodies, which would make a journey lasting decades not seem so long any more. .

However, while a trip lasting many years may be possible – and, indeed, might well be the only way humankind could ever realistically reach distant star systems, no matter how advanced our technology becomes – this kind of trip is only feasible for initial colonization of distant planets. It is hard to imagine going on vacation to an exotic resort on the planet Proxima Centauri b in the Alpha Centauri system, when the journey takes 4.4 years or more. Moreover, due to the relativistic time dilation mentioned above, when the tourists finally arrive back on Earth they will discover that all of their friends and relatives have long ago died of old age – not a fun vacation at all!

So far, the scenarios mentioned are safely within the realm of established science. However, science fiction writers often find them to be too restrictive. In a typical science-fiction scenario, the captain of a spaceship near Earth might instantaneously receives news of an alien attack on a human colony on Proxima Centauri b and, after a quick 4.4-light-year journey using the ship’s warp drive, will arrive at the exoplanet just in time to stop the aliens 4 4 4 And, speaking of aliens, most scenarios where aliens from a distant planet visit Earth assume the aliens are capable of superluminal travel. . Such scenarios require faster-than-light communication and travel, both of which are considered by most to be disallowed by the known laws of physics.

Another prominent staple of science fiction is the concept of time travel 5 5 5 In these notes, by “time travel” we will always mean travel to the past. Travel to the future is trivial – you can just sit and wait to time to pass, or use special or general relativistic time dilation to make it pass faster – and does not violate causality or create any paradoxes. . However, any use of a time machine seems to bluntly violate the principle of causality, and inherently bring upon irreconcilable paradoxes such as the so-called grandfather paradox. Unfortunately, works of science fiction which treat time travel paradoxes in a logical and consistent manner are extremely rare, and this is no surprise given that even us physicists don’t really understand how to make it consistent. As we will see below, time travel paradoxes may, in fact, be resolved, but a concrete mathematical model of paradox-free time travel has not yet been constructed.

In writing these notes, I relied heavily on the three excellent books on the subject by Visser [60], Lobo [38] and Krasnikov [33], as well as the popular general relativity textbooks by Carroll [6], Wald [62], Hawking & Ellis [23], and Poisson [55]. Many of the definitions, proofs and discussions are based on the material in these books. The reader is also encouraged to read Baez & Muniain [3] for a great introduction to relevant concepts in differential geometry.

Importantly, throughout these notes we will be using Planck units 6 6 6 Another popular convention in the literature is the reduced Planck units where instead of G = 1 one takes 8 π G = 1 . This simplifies some equations (e.g. the Einstein equation becomes G μ ν = T μ ν instead of G μ ν = 8 π T μ ν ) but complicated others (e.g. the coefficient of - d t 2 in the Schwarzschild metric becomes 1 - M / 4 π r instead of 1 - 2 M / r ). Here we will take G = 1 which seems like the more natural choice, since it continues the trend of setting fundamental dimensionful constant to 1. , where c = G = = 1 , and our metric signature of choice will be ( - , + , + , + ) .

2 An Outline of General Relativity

2.1 Basic Concepts

2.1.1 Manifolds and Metrics

Let spacetime be represented by a 4-dimensional pseudo-Riemannian manifold M equipped with a metric 𝐠 of signature 7 7 7 This means that 𝐠 has one negative and three positive eigenvalues. ( - , + , + , + ) . The metric is a symmetric tensor with components g μ ν in some coordinate system 8 8 8 We will use a popular abuse of notation where g μ ν will be called “the metric”, even though the metric is actually a tensor 𝐠 which happens to have these components in some coordinate system. Similarly, x μ will be called a “vector” even though the vector is actually 𝐱 , and so on. { x μ } , where μ , ν { 0 , 1 , 2 , 3 } . It has the line element 9 9 9 Here we are, of course, using the Einstein summation convention, where we automatically sum over any index which is repeated twice: once as an upper index and once as a lower index. So for example, x μ y μ μ = 0 3 x μ y μ and g μ ν x ν ν = 0 3 g μ ν x ν . The index which appears twice is said to be contracted.

d s 2 = g μ ν d x μ d x ν ,

and determinant g det 𝐠 .

The expression d s 2 may be understood as the infinitesimal and curved version of the expression Δ s 2 = - Δ t 2 + Δ x 2 + Δ y 2 + Δ z 2 for the spacetime interval between two points in special relativity, where Δ represents the coordinate difference between the points. In a curved spacetime, the difference between points loses its meaning since one cannot compare vectors at two separate tangent spaces without performing a parallel transport (see Sec. 2.3.1 ).

The simplest metric is the Minkowski metric, which is diagonal:

η μ ν diag ( - 1 , 1 , 1 , 1 )
= ( - 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 ) .

It describes a flat spacetime, and its line element, in the Cartesian coordinates { t , x , y , z } , is simply

d s 2 = - d t 2 + d x 2 + d y 2 + d z 2 .

2.1.2 Tangent Spaces and Vectors

At every point p M there is a tangent space T p M , consisting of tangent vectors to the manifold at that particular point. For example, if M is a sphere, the tangent space is the plane which intersects that sphere at exactly one point p , and the tangent vectors are vectors in that plane.

Given two tangent vectors u μ and v μ , the metric imposes the inner product and norm squared

𝐮 , 𝐯 g μ ν u μ v ν , (2.1)
| 𝐯 | 2 𝐯 , 𝐯 = g μ ν v μ v ν .

Since the manifold is not Riemannian, the norm squared is not positive-definite; it can be negative, positive, or even zero for a non-zero vector. Thus it doesn’t really make sense to talk about “the norm” | 𝐯 | , since it can be imaginary, and when we say “the norm” we will always mean the norm squared. Given a tangent vector v μ , if | 𝐯 | 2 < 0 it is called timelike, if | 𝐯 | 2 > 0 it is called spacelike, and if | 𝐯 | 2 < 0 it is called lightlike or null 10 10 10 Some people use the convention where the metric signature has the opposite sign, ( + , - , - , - ) . In this case, the definitions for timelike and spacelike have the opposite signs as well. Basically, “timelike” means “has a norm squared with the same sign as the time dimension in the metric signature”, and similarly for “spacelike”. .

Note that we have been using upper indices for vectors. We now define covectors (or 1-forms), which have components with lower indices, and act on vectors to produce a scalar. The metric relates a vector to a covector by lowering an index, for example u ν = g μ ν u μ . We then find that the inner product ( 2.1 ) may be written as the product 11 11 11 This explains why, in Footnote 9 , we said that the summation convention only works if the same index appears once as an upper index and once as a lower index. The summation simply gives us the inner product between the contracted vector and covector, or more generally, between the corresponding component of two tensors. One cannot contract two upper indices, since the inner product in that case would require first adding the metric to the expression. of a vector with a covector: 𝐮 , 𝐯 g μ ν u μ v ν = u ν v ν . Similarly, for the metric, we have used two lower indices, and wrote it as a matrix. The corresponding inverse matrix gives us the components of the inverse metric g μ ν , which satisfies g μ λ g λ ν = δ ν μ .

Finally, if we assign a tangent vector to every point in the manifold, then we have a vector field v μ ( 𝐱 ) . Similarly, we may talk about a scalar field ϕ ( 𝐱 ) , which assigns a number to every point, a covector field v μ ( 𝐱 ) , a tensor field g μ ν ( 𝐱 ) , and so on.

2.1.3 Curves and Proper Time

Let x μ ( λ ) be a curve (or path, or worldline) parametrized by some real parameter λ . For each value of λ , x μ ( λ ) is a point on the manifold. This point has a tangent space, and the tangent vector to the curve, defined as

x ˙ μ d x μ d λ ,

is a vector in this tangent space.

If x μ ( λ ) describes the worldline of a massive particle, then its tangent vector is timelike everywhere; | 𝒙 ˙ ( λ ) | 2 < 0 for all λ . In this case, we may calculate the proper time along the path, which is the coordinate-independent time as measured by a clock moving with the particle, and thus is an observable; in contrast, the coordinate time x 0 t is not an observable since it, of course, depends on the arbitrary choice of coordinates.

The differential of proper time is defined as minus the line element, that is, d τ 2 - d s 2 or 12 12 12 The minus sign comes from the fact that the time dimension has a minus sign in the metric signature. Thus, if we want the difference in proper time between a past event and a future event to be positive, we must cancel that minus sign.

d τ 2 = - g μ ν d x μ d x ν .

Employing a slight abuse of notation, we may “divide” this expression by d λ 2 where λ is the (arbitrary) curve parameter:

d τ 2 d λ 2 = ( d τ d λ ) 2 (2.2)
= - g μ ν d x μ d λ d x ν d λ (2.3)
= - g μ ν x ˙ μ x ˙ ν (2.4)
= - | 𝒙 ˙ | 2 . (2.5)

Thus, we learn that

d τ d λ = - | 𝒙 ˙ | 2
d τ = - | 𝒙 ˙ | 2 d λ ,

where the expression inside the square root is positive since | 𝒙 ˙ | 2 < 0 for a timelike path. Now we can find the total proper time τ along a path (that is, from one value of λ to a subsequent value of λ ) by integrating this differential:

τ d τ (2.6)
= - | 𝒙 ˙ | 2 d λ (2.7)
= - g μ ν x ˙ μ x ˙ ν d λ . (2.8)

This allows us, in principle, to calculate τ as a function of λ , and then use the proper time in place of λ as the parameter for our curve. When using the proper time as a parameter – and only then! – the tangent vector x ˙ μ d x μ / d τ is called the 4-velocity, and it is automatically normalized 13 13 13 Indeed, using the chain rule and ( 2.2 ), we have | 𝒙 ˙ | 2 = g μ ν d x μ d τ d x ν d τ = g μ ν d x μ d λ d x ν d λ ( d τ d λ ) 2 = g μ ν d x μ d λ d x ν d λ - g μ ν d x μ d λ d x ν d λ = - 1 . to | 𝒙 ˙ | 2 = - 1 . From now on, unless stated otherwise, we will always parametrize timelike paths with proper time, since the normalization | 𝒙 ˙ | 2 = - 1 simplifies many calculations, and allows us to talk about 4-velocity in a well-defined way.

2.1.4 Massless Particles

So far we have discussed massive particles, whose worldlines are timelike. The worldline of a massless particle, on the other hand, is a null path. For a null path we cannot define a proper time, since by definition its tangent is a null vector, so | 𝒙 ˙ ( λ ) | 2 = 0 for any choice of parameter λ (although, of course, the tangent vector itself is not the zero vector). Therefore, from ( 2.6 ) we have that τ = 0 between any two points on the path. This is why we sometimes say that massless particles, such as photons, “do not experience the passage of time”; the proper time along their worldlines vanishes.

This means that for null paths, there is no preferred parameter; however, as we will see below, null geodesics possess a family of preferred parameters called affine parameters. Furthermore, note that massless particles do not have a well-defined 4-velocity, since the definition of 4-velocity makes use of the proper time.

2.2 Covariant Derivatives and Connections

2.2.1 Defining Tensors

A function 𝐓 defined on a manifold is called a tensor of rank ( p , q ) if its components have p upper indices and q lower indices: T ν 1 ν q μ 1 μ p , and if, under a transformation from a coordinate system { x μ } to another one { x μ } , each upper index μ i receives a factor of x μ i / x μ i and each lower index ν i receives a factor of x ν i / x ν i (note that the original and primed coordinates switch places). Vectors and covectors are specific cases of tensors, with rank ( 1 , 0 ) and ( 0 , 1 ) respectively.

For example, the components of a vector v μ , a covector u μ and a rank ( 0 , 2 ) tensor g μ ν transform as follows 14 14 14 You don’t need to remember where the primes are placed, since they are located in the only place that makes sense for contracting with the components of the tensor, taking into account that an upper index in the denominator, e.g. the μ on / x μ , counts as a lower index – since μ / x μ – and thus should be contracted with an upper index. :

v μ = x μ x μ v μ ,
u μ = x μ x μ u μ ,
g μ ν = x μ x μ x ν x ν g μ ν .

If the tensors transforms this way, we say that they are covariant or transform covariantly. This is very important, since the abstract quantities 𝐯 , 𝐮 and 𝐠 are guaranteed to be invariant under the transformation if and only if their components transform exactly in this way.

2.2.2 Covariant Derivatives

The standard way to define differentiation on a manifold is by using the covariant derivative μ , which generalizes the usual partial derivative μ / x μ and is defined 15 15 15 Some sources use a notation where a comma indicates partial derivatives: μ u ν u ν , μ , and a semicolon indicates covariant derivatives: μ u ν u ν ; μ . We will not use this notation here. as follows:

  • On a scalar field ϕ : μ ϕ μ ϕ .

  • On a vector field v ν : μ v ν μ v ν + Γ μ λ ν v λ .

  • On a covector u ν : μ u ν μ u ν - Γ μ ν λ u λ .

  • On a rank ( p , q ) tensor 𝐓 with components T σ 1 σ q ν 1 ν p : The first term of μ T σ 1 σ q ν 1 ν p will be μ T σ 1 σ q ν 1 ν p , and to that we add one term Γ μ λ ν i T σ 1 σ q ν 1 λ ν p for each upper index ν i , and subtract one term Γ μ σ i λ T σ 1 λ σ q ν 1 ν p for each lower index σ i , generalizing the expressions above.

Γ μ ν λ are called the connection coefficients. Importantly, Γ μ ν λ are not the components of a tensor, since they do not transform covariantly under a change of coordinates. However, the partial derivative itself does not transform as a tensor either, and it just so happens that the unwanted terms from each of the transformations exactly cancel each others, so while μ T σ 1 σ q ν 1 ν p is not a tensor, the covariant derivative μ T σ 1 σ q ν 1 ν p of any tensor is itself a tensor, of rank ( p , q + 1 ) . This is exactly why covariant derivatives are so important: partial derivatives do not transform covariantly in a general curved spacetime, and thus do not generate tensors, but covariant derivatives do.

2.2.3 The Levi-Civita Connection

Just like the partial derivative, the covariant derivative satisfies linearity and the Leibniz rule. It also commutes with contractions, meaning that if we contract two indices of a tensor, e.g. T μ λ λ ν g λ ρ T μ λ ρ ν , and then apply the covariant derivative, then in the new tensor (of one higher rank) that we have formed, the same two indices will still be contracted: σ T μ λ = λ ν g λ ρ σ T μ λ ρ ν . However, the covariant derivative is not unique since it depends on the choice of Γ μ ν λ . A unique choice of connection coefficients may be obtained by requiring that the connection is also:

  • Torsion-free, meaning that the torsion tensor T λ μ ν Γ μ ν λ - Γ ν μ λ 2 Γ [ μ ν ] λ vanishes, or equivalently, the connection coefficients are symmetric in their lower indices: Γ μ ν λ = Γ ν μ λ .

  • Metric-compatible, that is, the covariant derivative of the metric vanishes: λ g μ ν = 0 . Note that this is a stronger condition than commuting with contractions, since we can also write T μ λ λ ν δ λ ρ T μ λ ρ ν , so commuting with contractions merely implies that μ δ λ ρ = 0 , which one would expect given that the Kronecker delta δ λ ρ is just the identity matrix.

With these additional constraints, there is one unique connection, called the Levi-Civita connection, whose coefficients are sometimes called Christoffel symbols, and are given as a function of the metric by:

Γ μ ν λ = 1 2 g λ σ ( μ g ν σ + ν g μ σ - σ g μ ν ) . (2.9)

The reader is encouraged to prove this and all other unproven claims in this section.

2.3 Parallel Transport and Geodesics

2.3.1 Parallel Transport

Since each point on the manifold has its own tangent space, it is unclear how to relate vectors (or more generally, tensors) at different points, since they belong to different vector spaces. Again, imagine the sphere, with the tangent spaces being planes which touch it only at one point. A tangent vector to one plane, if moved to another point, will not be tangent to the plane at that point.

A unique way of taking a tensor from one point to another point on the manifold is given by parallel transport. Let x μ ( λ ) be a curve, and let v μ be a vector whose value is known on a particular point on the curve (e.g. at λ = 0 ). To parallel transport v μ to another point along the curve (e.g. λ = 1 ), we solve the equation

x ˙ μ μ v ν = 0 ,

where x ˙ μ d x μ / d λ . Using the definition of the covariant derivative, this may be written explicitly as

x ˙ μ μ v σ + Γ μ ν σ x ˙ μ v ν = 0 .

Furthermore, using the chain rule d x μ d λ μ = d d λ , we may write this as

d d λ v σ + Γ μ ν σ x ˙ μ v ν = 0 .

Similar parallel transport equations are easily obtained for tensors of any rank.

2.3.2 Geodesics and Affine Parameters

Let us apply parallel transport to the tangent vector to the same curve we are parallel-transporting along, that is, take v ν x ˙ ν . Then we get:

x ˙ μ μ x ˙ σ = 0 (2.10)
x ¨ σ + Γ μ ν σ x ˙ μ x ˙ ν = 0 .

This equation is called the geodesic equation, and it generalizes the notion of a “straight line” to a curved space. Indeed, for the flat Minkowski space, we have Γ μ ν σ = 0 and thus the geodesics are given by curves satisfying 𝒙 ¨ = 0 , which describe straight lines.

Equation ( 2.10 ) demands that the vector tangent to the curve is parallel transported in the direction of the curve. This means that the resulting vector must have the same direction and magnitude. We can, in fact, weaken this condition and allow the resulting vector to have a different magnitude, while still demanding that its direction remains unchanged. This still captures the idea behind geodesics, namely that they are a generalization of straight lines in flat space. The resulting equation takes the form:

x ˙ μ μ x ˙ σ = α x ˙ σ , (2.11)

where α is some function on the curve. However, any curve which satisfies this equation may be reparametrized so that it satisfies ( 2.10 ). Indeed, ( 2.11 ) may be written explicitly as follows:

d 2 x σ d λ 2 + Γ μ ν σ d x μ d λ d x ν d λ = α d x σ d λ .

Now, consider a curve which solves ( 2.11 ) with some parameter λ and introduce a new parameter μ ( λ ) . Then we have

d d λ = d μ d λ d d μ
d 2 d λ 2 = d 2 μ d λ 2 d d μ + ( d μ d λ ) 2 d 2 d μ 2 ,

and we may rewrite the equation as follows:

d 2 μ d λ 2 d x σ d μ + ( d μ d λ ) 2 d 2 x σ d μ 2 +
+ Γ μ ν σ ( d μ d λ ) 2 d x μ d μ d x ν d μ = α d μ d λ d x σ d μ .

Rearranging, we get

d 2 x σ d μ 2 + Γ μ ν σ d x μ d μ d x ν d μ =
= ( d λ d μ ) 2 ( α d μ d λ - d 2 μ d λ 2 ) d x σ d μ .

Therefore the right-hand side will vanish if μ ( λ ) is a solution to the differential equation

d 2 μ d λ 2 = α ( λ ) d μ d λ . (2.12)

Such a solution always exists, and thus we have obtained the desired parameterization. The parameter μ for which the geodesic equation reduces to the form ( 2.10 ) is called an affine parameter. Note that this is in fact a whole family of parameters, since any other parameter given by ν A μ + B with A , B real constants is also an affine parameter, as can be easily seen from ( 2.12 ).

The geodesic equation is one of the two most important equations in general relativity; the other is Einstein’s equation, which we will discuss in Sec. 2.6 .

2.3.3 Massive Particles and Geodesics

A test particle is a particle which has a negligible effect on the curvature of spacetime. Such a particle’s path will always be a timelike geodesic if it’s a massive particle (such as an electron), or a null geodesic if it’s a massless particle (such as a photon), as long as it is in free fall – meaning that no forces act on it other than gravity.

For a massive particle with (rest) mass 16 16 16 Some textbooks also define a “relativistic mass” which depends on the velocity or frame of reference. However, this concept is not useful in general relativity (and even in special relativity it mostly just causes confusion). In these notes, as in most of the theoretical physics literature, the rest mass m is assumed to be a constant which is assigned to the particle once and for all. For example, the electron has roughly m = 511 keV , or the pure number m = 4.2 × 10 - 23 in Planck units, independently of its velocity or reference frame. m and 4-velocity x ˙ μ = d x μ / d τ , the 4-momentum is given by p μ m x ˙ μ . Recall again that the 4-velocity is only defined if the curve is parametrized by the proper time τ . Massless particles have neither mass, nor 4-velocity, since proper time is undefined for a null geodesic. Therefore the definition p μ m x ˙ μ would not make sense, and we simply define the 4-momentum as the tangent vector with respect to some affine parameter: p μ d x μ / d λ .

We will sometimes write the geodesic equation in terms of the particle’s 4-momentum as follows:

p ν ν p μ = 0 .

In other words, an unaccelerated (free-falling) particle keeps moving in the direction of its momentum. An observer moving with 4-velocity u μ then measures the energy of the particle to be

E - p μ u μ .

As a simple example, consider a flat spacetime with the Minkowski metric η μ ν diag ( - 1 , 1 , 1 , 1 ) . A massive particle with 4-momentum p μ m x ˙ μ is measured by an observer at rest, that is, with 4-velocity u μ = ( 1 , 0 , 0 , 0 ) . The energy measured will then be

E = - η μ ν p μ u ν
= - η 00 p 0 u 0
= p 0 .

In other words, in this case the energy is simply the time component of the 4-momentum. Motivated by this result, we take the 4-momentum to be of the form p μ ( E , p ) where p ( p 1 , p 2 , p 3 ) is the (spatial 17 17 17 We will use bold font, 𝐯 , for spacetime 4-vectors and an arrow, v , for spatial 3-vectors. ) 3-momentum. Since | 𝒙 ˙ | 2 = - 1 , the norm of the 4-momentum is given by

| 𝐩 | 2 = m 2 | 𝒙 ˙ | 2 = - m 2 .

On the other hand, by direct calculation we find

| 𝐩 | 2 = - E 2 + | p | 2 , (2.13)

where | p | 2 ( p 1 ) 2 + ( p 2 ) 2 + ( p 3 ) 2 . Comparing both expressions, we see that

E 2 = m 2 + | p | 2 . (2.14)

In the rest frame of the particle, where p = 0 , this reduces to the familiar mass-energy equivalence equation E = m c 2 (with c = 1 ).

2.3.4 Massless Particles and the Speed of Light

For massless particles, the situation is simpler. We again take p μ ( E , p ) , so an observer at rest with u μ = ( 1 , 0 , 0 , 0 ) will measure the energy E . Furthermore, since by definition p μ x ˙ μ and x ˙ μ is null, we have

| 𝐩 | 2 = | 𝒙 ˙ | 2 = 0 ,

and combining with ( 2.13 ) we find that E 2 = | p | 2 . Again, this is the familiar equation E = p c , with c = 1 . We conclude that the relation ( 2.14 ) applies to all particles, whether massive or massless.

Even though a massless particle doesn’t have a 4-velocity, we know that it locally (i.e. at the same point as the observer) always moves at the speed of light, v = c = 1 . This is true both in special and general relativity. It is easy to see in a flat spacetime. The line element is

d s 2 = - d t 2 + d x 2 + d y 2 + d z 2 .

Let us assume the massless particle is moving at this instant in the x direction (if it isn’t, then we can rotate our coordinate system until it is). Then, since the y and z coordinates remain unchanged, we have d y = d z = 0 . Furthermore, since the particle is moving along a null path, it has d s 2 = 0 as well. In conclusion, we have

0 = - d t 2 + d x 2 ,

which may be rearranged into

d x d y = ± 1 .

In other words, the particle is moving at the speed of light 1, either in the positive or negative x direction.

However, if the particle was massive, we would have d s 2 < 0 since it is moving along a timelike path. Therefore we have

d s 2 = - d t 2 + d x 2 < 0 ,

which may be rearranged into

| d x d t | < 1 .

Hence, the particle is necessarily moving locally at strictly slower than the speed of light.

So far, we have only considered special relativity. In general relativity we have an arbitrarily curved spacetime, and naively, it seems that massless particles may move at any speed. This is easy to see by considering, for example, the following metric:

d s 2 = - V 2 d t 2 + d x 2 ,

where V is some real number. Then for a massless particle we have d s 2 = 0 and thus

d x d t = ± V ,

so the speed of the particle is given by the arbitrary number V . The problem here is that we have calculated the coordinate speed of the particle, not its the local speed. General relativity works in any coordinate system – this is called general covariance or diffeomorphism invariance – and the coordinate speed will naturally depend on the choice of coordinate system.

The fact that massless particles always locally move at the speed of light easily follows from the well-known result that at a particular point p it is always possible to transform to locally inertial coordinates 18 18 18 See e.g. [6] for the details of how to construct such coordinates. , which have the property that g μ ν ( p ) = η μ ν and σ g μ ν ( p ) = 0 . Then, as far as the observer at p is concerned, spacetime is completely flat in their immediate vicinity, and thus they will see massless particles passing by at the speed of light.

Finally, it is important to note that if a particle starts moving along a timelike or null geodesic it can never suddenly switch to moving along a different type of geodesic or path; this is simply due to the mathematical fact that the parallel transport preserves the norm of the tangent vector to the path. Indeed, if the tangent vector being parallel transported is v μ , then x ˙ μ μ v ν = 0 and thus

x ˙ μ μ ( | 𝐯 | 2 ) = g α β x ˙ μ μ ( v α v β ) (2.15)
= g α β ( v β ( x ˙ μ μ v α ) + v α ( x ˙ μ μ v β ) ) = 0 , (2.16)

so | 𝐯 | 2 is constant along the path.

2.3.5 Inertial Motion, Maximization of Proper Time, and the Twin “Paradox”

It is important to clarify that particles follow geodesics only if they are in inertial motion, or free-falling. By this we mean that the only force acting on the particle is that of gravity, and there are no other forces which would influence the particle’s trajectory. If non-gravitational forces act on the particle – or, for example, a spaceship uses its rockets – then it will no longer follow a geodesic.

Now, in Euclidean space, geodesics minimize distance. However, in a curved space with Lorentzian signature, geodesics instead maximize proper time – at least for massive particles, since for massless particles the proper time vanishes by definition. This can be seen from the fact that the geodesic equation may be derived by demanding that the variation of the proper time integral ( 2.6 ) vanishes 19 19 19 See e.g. [6], chapter 3.3. .

The two facts we have just mentioned provide an elegant solution to the famous twin “paradox” of special relativity. In this “paradox” there are two twins, Alice and Bob. Alice stays on Earth while Bob goes on a round trip to a nearby planet, traveling at a significant fraction of the speed of light. When Bob returns to Earth, the twins discover that Alice is now older than Bob, due to relativistic time dilation.

The “paradox” lies in the naive assumption that, since from Bob’s point of view he was the one who stayed in place while Alice was the one moving with respect to him, then Bob would expect himself to be the older twin. However, obviously both twins cannot be older than each other, which leads to a “paradox”. The solution to the “paradox” lies in the fact that Alice remained in inertial motion for the entire time, while Bob was accelerating and thus not in inertial motion, hence there is an asymmetry between the twins.

Alternatively, if one does not wish to complicate things by involving acceleration, we may assume that all of the accelerations involved (speeding up, slowing down, and turning around) were instantaneous. Then Bob was also in inertial motion for the entire time, except for the moment of turnaround.

At that moment, the inertial frame for the outbound journey is replaced with a completely different inertial frame for the return journey. That one singular moment of turnaround is alone responsible for the asymmetry between the points of view of the twins. It is easy to see by drawing spacetime diagrams – as the reader is encouraged to do – that at the moment of turnaround, Bob’s notion of simultaneity (that is, surfaces of constant t ) changes dramatically. From Bob’s point of view, Alice ages instantaneously at the moment!

This is the standard solution from the special relativistic point of view. However, now that we know about general relativity and geodesics, we may provide a much simpler solution. Since Alice remains in inertial motion, and the only forces that act on her are gravitational forces, she simply follows a timelike geodesic. On the other hand, since Bob is using non-gravitational forces (e.g. rockets) to accelerate himself, he will not follow a timelike geodesic. Of course, he will still follow a timelike path, but the path will not be a geodesic.

In other words, both Alice and Bob’s paths in spacetime begin and end at the same point, but Alice follows a timelike geodesic while Bob follows a timelike path which is not a geodesic. Since timelike geodesics are exactly the timelike paths which maximize proper time, the proper time experienced by Alice must be larger than the proper time experienced by Bob. Therefore, Alice must be the older twin.

2.4 Curvature

2.4.1 The Riemann Curvature Tensor

The Riemann curvature tensor is defined as the commutator 20 20 20 Note that if there is torsion, that is Γ μ ν λ Γ ν μ λ , then one must add a term 2 Γ [ μ ν ] λ λ v ρ to the right-hand side of this equation, where 2 Γ [ μ ν ] λ Γ μ ν λ - Γ ν μ λ . of the action of two covariant derivatives on a vector:

R ρ v σ σ μ ν = [ μ , ν ] v ρ . (2.17)

Since the covariant derivative facilitates parallel transport, this can – roughly speaking – be understood as taking the vector v ρ from some point p along a path given by μ ν v ρ to a nearby point q , and then taking it back from q along a path given by - ν μ v ρ to p . In other words, we take v ρ along a loop. If spacetime was flat, we would expect that the vector v ρ would remain unchanged after going around the loop. However, if spacetime is curved, there is a difference between the initial v ρ and the final v ρ (note that both are in the same tangent space T p M , so we may compare them). This difference is encoded in the Riemann tensor R σ μ ν ρ .

The full coordinate expression for the Riemann tensor in terms of the connection coefficients may be calculated from the definition, and it is given by:

R ρ = σ μ ν μ Γ ν σ ρ - ν Γ μ σ ρ + Γ μ λ ρ Γ ν σ λ - Γ ν λ ρ Γ μ σ λ .

If we lower the first index with the metric ( R ρ σ μ ν = g λ ρ R λ σ μ ν ), the resulting tensor satisfies the following identities:

  • Symmetry and anti-symmetry under exchange of indices: R ρ σ μ ν = - R σ ρ μ ν = - R ρ σ ν μ = R μ ν ρ σ .

  • First Bianchi identity: R ρ [ σ μ ν ] = 0 or more explicitly 21 21 21 Here 6 R ρ [ σ μ ν ] R ρ σ μ ν + R ρ μ ν σ + R ρ ν σ μ - R ρ μ σ ν - R ρ ν μ σ - R ρ σ ν μ is the usual anti-symmetrizer, and then we use the anti-symmetry in the last two indices. R ρ σ μ ν + R ρ μ ν σ + R ρ ν σ μ = 0 .

  • Second Bianchi identity: [ λ R ρ σ ] μ ν = 0 .

2.4.2 Related Tensors: Ricci and Weyl

By contracting the first and third index of the Riemann tensor, we obtain the Ricci tensor:

R μ ν R λ . μ λ ν

Note that it is symmetric, R μ ν = R ν μ . The trace of the Ricci tensor is the Ricci scalar:

R R μ μ g μ ν R μ ν .

We may also define (in 4 dimensions 22 22 22 For a spacetime of dimension d we have C ρ σ μ ν R ρ σ μ ν + - 2 d - 2 ( g ρ [ μ R ν ] σ - g σ [ μ R ν ] ρ - 1 d - 1 g ρ [ μ g ν ] σ R ) . ) the Weyl tensor:

C ρ σ μ ν R ρ σ μ ν - g ρ [ μ R ν ] σ +
+ g σ [ μ R ν ] ρ + 1 3 g ρ [ μ g ν ] σ R .

The Weyl tensor has all the symmetries of the Riemann tensor (including the first Bianchi identity), but it is completely traceless: it vanishes upon contraction of any pair of indices.

2.5 Extrinsic Curvature

2.5.1 Intrinsic and Extrinsic Curvature

Let us now consider a surface embedded in a higher-dimensional space. The Riemann tensor describes the intrinsic curvature of the surface; this is the curvature within the surface itself, which exists intrinsically, regardless of any embedding (as long as the embedding is isometric, see below).

In contrast, extrinsic curvature is the curvature of the surface which comes from the way in which it is embedded, and from the particular space in which it is embedded. Intrinsic curvature comes from the parallel transport of vectors tangent to (a curve on) the surface, while extrinsic curvature comes from parallel transport of vectors normal to the surface.

For example, a flat piece of paper has no intrinsic curvature. If we draw a triangle on it, the angles will sum to π . However, if we roll that paper into a cylinder, then as seen from the (flat) higher-dimensional space in which we live, the angles of the triangle will no longer sum to π . Thus, the surface has acquired an extrinsic curvature.

However, the intrinsic curvature is still flat, since within the surface itself, the edges of the triangle are still geodesics – regardless of its embedding into the higher-dimensional space. Therefore, the angles still sum to π when viewed from inside the surface. In other words, the intrinsic curvature is completely independent of any embedding of the surface.

Let us see this with a concrete calculation. We take a cylinder with circumference 2 π L . It can be obtained by “rolling up” 2 , i.e. by performing the periodic identification

( x , y ) ( x , y + 2 π L ) .

Topologically, this is homeomorphic 23 23 23 A homeomorphism between topological spaces X and Y is a continuous function that has a continuous inverse. The importance of homeomorphisms is that they preserve the topological properties of the space. If there is a homeomorphism between two spaces, we say that they are homeomorphic to each other. to × S 1 . The metric is inherited from the original (unrolled) 2 , and is thus flat. Now, let us embed the cylinder in 3 . Introducing cylindrical coordinates ( r , ϕ , z ) , the metric on 3 takes the form

d s 2 = d r 2 + r 2 d ϕ 2 + d z 2 . (2.18)

Taking a constant r = L and identifying the points

( L , ϕ , z ) ( L , ϕ + 2 π , z ) ,

we get the same cylinder with circumference 2 π L , and it has the induced metric (with d r = 0 since r is constant)

d s 2 | r = L = L 2 d ϕ 2 + d z 2 .

Since the components of this induced metric are constant, the intrinsic curvature of the cylinder is zero.

2.5.2 Isometric and Non-Isometric Embeddings

Let ( M , g μ ν ) and ( N , h μ ν ) be two Riemannian manifolds, with metrics g μ ν and h μ ν . An immersion between the manifolds is a differentiable function F : M N with an injective (one-to-one) derivative. An embedding of M into N is an injective immersion such that M is homeomorphic to its image f ( M ) . The embedding is called isometric if it also preserves the metric, that is, g μ ν = f * h μ ν where f * h μ ν is the pullback 24 24 24 The pullback f * 𝐓 of a tensor 𝐓 by the map f : M N literally “pulls back” 𝐓 from N into the source manifold M . In the case of a metric h μ ν , which is a rank ( 0 , 2 ) tensor, the pullback acts on its components as follows: ( f * h ( 𝐱 ) ) μ ν = x α y μ x β y ν h α β ( 𝐲 ) , where 𝐲 are coordinates on N and 𝐱 are coordinates on M . of h μ ν by f .

The embedding of the cylinder × S 1 into 3 is isometric, since the induced metric on 3 is flat, and thus equal to the original metric on the cylinder. Therefore, the intrinsic curvature remains unchanged after the embedding. However, there are cases when an embedding forces us to change the intrinsic curvature of a manifold. This happens when we cannot embed our manifold in another one without stretching or bending it.

An illustrative example is given by the torus, which may be obtained from 2 in a similar manner to the cylinder, except that now both coordinates are identified, with periods 2 π L 1 and 2 π L 2 :

( x , y ) ( x + 2 π L 1 , y + 2 π L 2 ) .

Topologically, this is homeomorphic to S 1 × S 1 . The metric inherited from 2 is, of course, still flat. However, let us now embed the torus 3 , with the same cylindrical coordinates as before. The surface will be defined as the set of points solving the equation

z 2 + ( r - L 1 ) 2 = L 2 2 .

Indeed, for each value of ϕ , this equation defines a circle with radius L 2 centered at ( r , z ) = ( L 1 , 0 ) . The surface of revolution of a circle is a torus; it can also be seen as a cylinder whose top and bottom have been glued together. Thus L 1 is the major radius, or the distance from the z axis to the center of the circle, while L 2 is the minor radius, that of the circle that is being revolved.

Let us now isolate z :

z = ± L 2 2 - ( r - L 1 ) 2 .


d z = - ( r - L 1 ) d r L 2 2 - ( r - L 1 ) 2 ,

and by plugging this into the flat metric in cylindrical coordinates on 3 , ( 2.18 ), we obtain the following induced metric:

d s 2 | z 2 + ( r - L 1 ) 2 = L 2 2 =
= L 2 2 L 2 2 - ( r - L 1 ) 2 d r 2 + r 2 d ϕ 2 ,

where L 1 - L 2 r L 1 + L 2 . This metric is not flat, as can be checked e.g. by calculating the Ricci scalar, which is

R = 2 ( r - L 1 ) L 2 2 r .

Thus, in this case the intrinsic curvature is not flat after the embedding.

In the case of the cylinder, all we did when we embedded it in 3 was to take a flat plane and glue two opposite ends of it together. This can be easily illustrated by taking a flat piece of paper and bending it to create a cylinder. However, for the torus, we started with a cylinder – which is still intrinsically flat – and glued its top and bottom together. This cannot be done without stretching the paper (try it!), resulting in an intrinsic curvature that is no longer flat. In other words, this embedding is not isometric 25 25 25 Note, however, that it is possible to isometrically embed the torus in 4 . .

2.5.3 Surfaces and Normal Vectors

Let Σ be a surface embedded in a (higher-dimensional) manifold M . This surface may be defined, for example, by an equation of the form f ( 𝐱 ) = 0 , as we did for the torus. Then the vector field ξ μ μ f = μ f is everywhere normal to the surface, meaning that it is orthogonal to every vector tangent to Σ (we will not prove this here). The nature of the surface will have the opposite sign to that of the normal vector: if ξ μ is timelike the surface is spacelike, if ξ μ is spacelike the surface is timelike, and if ξ μ is null the surface is also null.

If ξ μ is timelike or spacelike, we may normalize it and define a unit normal vector:

n μ ξ μ | ξ λ ξ λ | .

If ξ μ is null, then it is not only normal but also tangent to Σ , since it is orthogonal to itself. Thus the integral curves x μ ( λ ) to the vector field ξ μ , which are curves such that their tangent vectors are equal to ξ μ at every point along the curve:

d x μ d λ = ξ μ , (2.19)

are null curves contained in Σ . Now, we have

ξ μ μ ξ ν = ξ μ μ ν f (2.20)
= ξ μ ( μ ν f + Γ μ ν λ λ f ) (2.21)
= ξ μ ν μ f (2.22)
= ξ μ ν ξ μ (2.23)
= 1 2 ν ( ξ μ ξ μ ) , (2.24)

where we used the fact that Γ μ ν λ is symmetric in its lower indices. Unfortunately, ξ μ ξ μ does not necessarily vanish outside of Σ , so we cannot conclude that the last term vanishes. However, if it does not vanish, we simply redefine the defining equation f ( 𝐱 ) = 0 for the surface using f ( 𝐱 ) ξ μ ξ μ , and then the last term in ( 2.20 ) is simply 1 2 ν f ( 𝐱 ) , which is normal to the surface and thus must be proportional to ξ μ ! If the proportionality function is α , then we get

ξ μ μ ξ ν = α ξ ν ,

which is the generalized geodesic equation ( 2.11 ). As we have seen when discussing that function, we may reparameterize the geodesic using an affine parameter λ such that ξ μ μ ξ ν = 0 . The null geodesics x μ ( λ ) defined by ( 2.19 ) are called the generators of the null surface Σ , since the surface is the union of these geodesics.

2.5.4 The Projector on the Surface

Let M be a manifold with metric g μ ν . We define the projector on the surface Σ with unit normal n μ as follows:

P μ ν g μ ν - | 𝐧 | 2 n μ n ν . (2.25)

Note that | 𝐧 | 2 n λ n λ = ± 1 with + for timelike and - for spacelike surfaces. This symmetric tensor projects any vector field v μ in M to a tangent vector P ν μ v ν on Σ . We can see this by showing that the projected vector is orthogonal to the normal vector:

n μ ( P μ ν v ν ) = n μ v μ - | 𝐧 | 4 n ν v ν
= 0 ,

since | 𝐧 | 4 = 1 whether n μ is spacelike or timelike. Furthermore, the projector acts like a metric on vectors tangent to Σ :

P μ ν v μ w ν = g μ ν v μ w ν
= 𝐯 , 𝐰 ,

since n μ v μ = n μ w μ = 0 for tangent vectors. Also, the projector remains unchanged under its own action:

P λ μ P ν λ = ( δ λ μ - | 𝐧 | 2 n μ n λ ) ( δ ν λ - | 𝐧 | 2 n λ n ν )
= δ ν μ - 2 | 𝐧 | 2 n μ n ν + | 𝐧 | 6 n μ n ν = P ν μ ,

since | 𝐧 | 6 = | 𝐧 | 2 .

2.5.5 Definition of Extrinsic Curvature

The extrinsic curvature tensor 26 26 26 Sometimes called the second fundamental form in the differential geometry literature. K μ ν is a rank ( 0 , 2 ) symmetric tensor defined as the Lie derivative 27 27 27 We will not go into the rigorous definition of the Lie derivative here, but just note that a general formula for its action on tensors is 𝐧 T μ = ν n λ λ T μ - ν T λ λ ν n μ - + T μ ν λ n λ + , where for each upper index of 𝐓 we add a negative term with the index on 𝐧 exchanged with that index, and for each lower index of 𝐓 we add a positive term with the index on the partial derivative exchanged with that index. Note that the partial derivatives may be replaced with covariant derivatives, since the extra terms all cancel (show this). of the projector P μ ν in the direction of the normal vector n μ :

K μ ν 1 2 𝐧 P μ ν . (2.26)

From the coordinate expression for the Lie derivative (see Footnote 27 ) it is relatively straightforward (try it!) to show that

K μ ν = P μ α P ν β ( α n β )
= 1 2 P μ α P ν β 𝐧 g α β ,

since by definition 𝐧 g α β = 2 ( α n β ) .

Finally, let us assume that n μ is tangent to a geodesic. This is generally possible by extending it off the surface Σ using the geodesic equation, n λ λ n μ = 0 . Then we have

𝐧 n ν = n λ λ n ν + n λ ν n λ
= 0 + 1 2 ν ( n λ n λ )
= 0 ,

since n λ n λ is a constant. Therefore we find that

K μ ν = 1 2 𝐧 P μ ν (2.27)
= 1 2 𝐧 ( g μ ν - | 𝐧 | 2 n μ n ν ) (2.28)
= 1 2 𝐧 g μ ν (2.29)
= ( μ n ν ) . (2.30)

2.6 Einstein’s Equation

Let us define the Einstein-Hilbert action 28 28 28 The volume form ϵ 1 4 ! ϵ ρ σ μ ν d x ρ d x σ d x μ d x ν = - g d 4 x is a 4-form also known as the Levi-Civita tensor, and its components ϵ ρ σ μ ν are related to the familiar totally anti-symmetric Levi-Civita symbol ϵ ~ ρ σ μ ν by ϵ ρ σ μ ν = - g ϵ ~ ρ σ μ ν . Note that ϵ ~ ρ σ μ ν is not a tensor but a tensor density of weight 1, since it is related to a tensor by a factor of one power of - g . :

S H 1 16 π R - g d 4 x , (2.31)

where g det 𝐠 . The factor 29 29 29 If we did not use units where G 1 , this would instead be 1 / 16 π G . of 1 / 16 π in front of the action is a convention chosen to produce the correct form of Newton’s law of gravitation F = m 1 m 2 / r 2 in the Newtonian limit. By varying this action with respect to the metric, we obtain the vacuum Einstein equation:

1 - g δ S H δ g μ ν = 1 16 π ( R μ ν - 1 2 R g μ ν )
= 0
R μ ν - 1 2 R g μ ν = 0 . (2.32)

To add matter, we simply add the appropriate action S M for the type of matter we would like to consider: