Lecture Notes for PHY 256:
Introduction to Quantum Physics
Download these notes in PDF format
Table of Contents
- 1 Introduction
- 2 Non-Technical Overview
3 Mathematical Background
- 3.1 Complex Numbers
3.2 Linear Algebra
- 3.2.1 Complex Vector Spaces
- 3.2.2 Dual Vectors, Inner Products, Norms, and Hilbert Spaces
- 3.2.3 Orthonormal Bases
- 3.2.4 Matrices and the Adjoint
- 3.2.5 The Outer Product
- 3.2.6 The Completeness Relation
- 3.2.7 Representing Vectors in Different Bases
- 3.2.8 Change of Basis
- 3.2.9 Multiplication and Inverse of Matrices
- 3.2.10 Matrices Inside Inner Products
- 3.2.11 Eigenvalues and Eigenvectors
- 3.2.12 Hermitian Matrices
- 3.2.13 Unitary Matrices
- 3.2.14 Normal Matrices
- 3.2.15 Representing Matrices in Different Bases
- 3.2.16 Diagonalizable Matrices
- 3.2.17 The Cauchy-Schwarz Inequality
- 3.3 Probability Theory
4 The Foundations of Quantum Theory
- 4.1 Axiomatic Definition
- 4.2 Two-State Systems, Spin , and Qubits
- 4.3 Composite Systems and Quantum Entanglement
- 4.4 Non-Commuting Observables and the Uncertainty Principle
4.5 Dynamics, Transformations, and Measurements
- 4.5.1 Unitary Transformations and Evolution
- 4.5.2 Quantum Logic Gates
- 4.5.3 The Measurement Axiom (Projective)
- 4.5.4 Applications of the Measurement Axiom
- 4.5.5 The Measurement Axiom (Simplified)
- 4.5.6 Interpretations of Quantum Mechanics and the Measurement Problem
- 4.5.7 Superposition Once Again: Schrödinger’s Cat
- 4.6 The No-Cloning Theorem and Quantum Teleportation
- 4.7 The Foundations of Quantum Theory: Summary
5 Continuous Quantum Systems
- 5.1 Mathematical Preliminaries
- 5.2 Continuous Time Evolution, Hamiltonians, and the Schrödinger Equation
- 5.3 Hamiltonian Mechanics and Canonical Quantization
- 5.4 The Harmonic Oscillator
- 5.5 Wavefunctions, Position, and Momentum
- 5.6 Solutions of the Schrödinger Equation
1.1 Course Outline
This course will serve as a comprehensive introduction to the foundations of quantum mechanics, from the modern point of view of 21st century theoretical physics. It will be somewhat different from a traditional first course in quantum mechanics, in that we will develop the theory from scratch in an axiomatic and mathematically rigorous(ish) way. There will be less emphasis on doing calculations, and more on a deep conceptual understanding of the theory.
First, a short non-technical overview of quantum mechanics will be provided. We will discuss the failures of classical mechanics that prompted the development of the quantum theory, and list the major differences between classical and quantum mechanics.
Next, we will learn the necessary mathematical background, including complex numbers, linear algebra, and probability. Even if you took courses on these subjects before, you should still pay careful attention, since we will learn the material from the quantum point of view and introduce important notation that is unique to quantum mechanics.
Once we have a firm grasp of the mathematical background, we will use it to define quantum mechanics axiomatically. We will learn about fundamental concepts such as Hilbert spaces, states, operators, observables, superposition, probability amplitudes, and expectation values.
Then, we will begin studying simple discrete quantum systems known as qubits, which are the quantum analogue of bits, and are used in quantum computers. We will learn about Schrödinger’s cat, quantum entanglement, Bell’s theorem, the uncertainty principle, unitary evolution, quantum measurements, and quantum teleportation.
In the remainder of the course we will study continuous quantum systems and related concepts, including Hamiltonians, the Schrödinger equation, canonical quantization, the quantum harmonic oscillator, wavefunctions, quantum interference, and solutions to the time-independent Schrödinger equation, including scattering and tunneling in one dimension.
By the end of the course, the students should expect to have a fairly good understanding of quantum mechanics, and to develop an intuition for this very strange and unintuitive theory. They will also be adequately prepared to dive deeper into the subject, whether by taking more advanced courses or by doing research.
1.2 Exercises and Problems
Throughout these notes, you will find many exercises and problems.
Exercises are usually just calculations. They are meant to verify that you understand how to calculate things, and they are usually simple and straightforward.
Problems are usually proof-based. They are meant to verify that you understand the more abstract relations between the concepts we will introduce, and they often require some thought.
2 Non-Technical Overview
In this chapter, I will provide a non-technical overview of quantum physics, and how it compares to classical physics. I won’t go into exactly who discovered what and in which year, because this is not a history course; this is a course about how the universe works. However, if you are interested in the history of quantum mechanics, there are many excellent websites and textbooks on the subject, and you are encouraged to look them up.
Instead, I will focus on two main goals in this chapter:
Introducing some of the fundamental experiments which illustrate why classical mechanics needs to be replaced with a more fundamental theory. This should also convince you that your classical intuition must be replaced with quantum intuition, which is what we will try to develop in this course.
Summarizing the fundamental properties of quantum mechanics and the differences between it and classical mechanics in non-technical terms, without going into the math. This should give you some idea of what we will study throughout this course in much more detail and with the full, uncensored mathematical framework.
2.1 The Failures of Classical Physics
2.1.1 Black-Body Radiation and the Ultraviolet Catastrophe
A black body is an object that absorbs all incoming light at all frequencies. It absorbs it and does not reflect it – therefore, it is black. More generally, it absorbs not just light, but all electromagnetic radiation. Black bodies also emit radiation, due to their heat. Electromagnetic radiation has a spectrum of wavelengths of different lengths. We are interested in predicting the amount of radiation emitted by the black body at each wavelength, which we will refer to as the black body’s spectrum.
One can try to use classical physics to calculate this spectrum. It turns out that the amount of the radiation is inversely proportional to the wavelength 1 1 1 More precisely, the power emitted per unit area per unit solid angle per unit wavelength is proportional to where is the wavelength… But fortunately, we don’t need to be very precise here! . This means that as the wavelength approaches zero, the amount of radiation approaches infinity! This is illustrated by the black curve in Figure 2.1 . This result is called the ultraviolet catastrophe, since ultraviolet light has shorter wavelengths than visible light. Obviously, this does not fit well with experimental data, since when we measure the total radiation emitted from a black body, we most definitely do not measure it to be infinity!
To solve this problem, we must use quantum physics. If we assume that radiation can only be emitted in discrete “packets” of energy called quanta, we get the correct spectrum of radiation, which is compatible with experiment. The law describing the amount of radiation at each wavelength is called Planck’s law. In Figure 2.1 , we can see three different curves, calculated using Planck’s law, giving the radiation spectrum at different temperatures (in Kelvin). You can see that the total amount of radiation is no longer infinite. The quanta of electromagnetic radiation are called photons.
2.1.2 The Photoelectric Effect
When light hits a material, it causes the material to emit electrons. This phenomenon is called the photoelectric effect. Using classical physics, and the assumption that light is a wave, we can make the following predictions:
Brighter light should have more energy, so it should cause the emitted electrons to have more kinetic energy, and thus move faster.
Light with higher frequency should hit the material more often, so it should cause a higher rate of electron emission, resulting in a larger electric current.
Assuming there is a certain minimum energy needed to dislodge an electron from the material, sufficiently bright light of any frequency should cause electron emission.
However, what actually happens is the exact opposite:
The kinetic energy of the emitted electrons increases with frequency, not brightness.
The electric current increases with brightness, not frequency.
Electrons are emitted only when the frequency of the light exceeds a certain threshold, regardless of how bright it is.
This is illustrated in Figure 2.2 , where the red light does not cause any electrons to be emitted, but the green and blue lights do, since they have higher frequency. Furthermore, since the blue light has higher frequency than the green light, the kinetic energy of the emitted electrons is larger.
To explain this, we must again use quantum physics. Einstein proposed to use the same model that Planck suggested to solve the ultraviolet catastrophe, where light is made of discrete photons. Each photon has energy proportional to the frequency of the light, and brighter light of the same frequency simply has more photons, each photon still with the same amount of energy. This model fits the predictions perfectly.
So in Figure 2.2 , making the red light brighter will increase the number of photons, but no matter how bright it is, the individual photons it’s made of still do not have enough energy to dislodge an electron on their own. On the other hand, each individual photon of the green and blue lights has, on its own, enough energy to dislodge a photon, and even if the light is very dim, the electrons will still be emitted.
2.1.3 The Double-Slit Experiment
The previous two experiments may have convinced you that light is not a wave, but a particle. But is that really the case? The double-slit experiment shows that things are actually more complicated. In this experiment, a light beam hits a plate with two parallel slits. Most of the light is blocked by the plate, but some of it passes through the slits and hits a screen, creating a pattern of bright and dark bands.
This can be most naturally explained by assuming that light is not a particle, but a wave. Each of the slits becomes the origin of a new wave, as illustrated in Figure 2.3 . Each of the two waves has crests and troughs. When a crest of one wave is at the same place as a crest of the other wave, they add up to create a crest with double the magnitude. This is called constructive interference. On the other hand, if a crest of one wave is at the same place as a trough of the other wave, they cancel each other. This is called destructive interference. See Figure 2.4 for an illustration. The pattern on the screen, as seen in Figure 2.3 , is a consequence of this interference.
So the double-slit experiment seems to prove that light is a wave, in contradiction with black-body radiation and the photoelectric effect, which seem to prove that light is a particle. It turns out that, in fact, both are correct; the quantum nature of light has the consequence that it sometimes behaves like a classical wave, and other times like a classical particle. This is called wave-particle duality. Contrary to common misconception, this doesn’t mean that light is “both a wave and a particle”; it simply demonstrates that the classical concepts of “wave” and “particle” are not the proper way to describe reality.
Okay, so light exhibits wave-particle duality. Maybe this makes sense. But matter, which is a tangible thing you can touch, is definitely made of particles, right? To check that, we can replace the beam of light with a beam of electrons. Since we think electrons are particles, not waves, we expect to find on the screen not an interference pattern, but just individual dots corresponding to the individual electron particles. And this is indeed what happens, except… If we run the experiment for some time, and let the electrons build up, then after a while we see that an interference pattern emerges nonetheless! This is shown in Figure 2.5 .
What does this mean? It means that, in quantum physics, both light and matter exhibit wave-particle duality. In classical physics, the measurement of the position of the electron on the screen is deterministic; if we know the initial position and velocity of the electron, then we can predict exactly where the electron lands. In quantum physics, we instead have a probability distribution, which gives us the probability for the electron to be measured at each particular point on the screen. This probability distribution turns out to propagate in space like a wave, and interfere with itself constructively and destructively on the way as a wave does, which is what causes the interference pattern on the screen – it is actually a pattern of probabilities! In the end, the probability will be enlarged on some points of the screen and reduced on other points.
To clarify how the measurement of the positions of the electrons on the screen yields a probability distribution, consider instead a 6-sided die. If you roll the die just once or twice, you won’t have much information about the probabilities to roll each number on the die. This is analogous to sending just a couple of electrons through the slits. What you need to do is to roll the die a large number of times, let’s say 6,000 times. Then you count how many times the die rolled on each number. For example, if it rolled around 1,000 times on each number, then you know the die is fair; but if it rolled around 2,000 times on 6 and around 800 times on every other number, then you know the die is loaded. Similarly, we need to send a large number of electrons through the slits in order to determine the probability distribution for their positions on the screen. It turns out that the position of the electron is “loaded”!
As an aside, in 21st century terms, the precise answer to the question “is light a wave or a particle?” turns out to be that both of them are different aspects of the same fundamental entity called the quantum electromagnetic field. This field propagates from place to place like a wave, but on the other hand, if you put enough energy into it, you can cause a quantum excitation in the field. It is this excitation that behaves like a particle.
Moreover, it turns out that all elementary particles are quantum fields, and thus all of them exhibit these two aspects. This is called quantum field theory. It neatly unites quantum mechanics with special relativity, and explains elementary particle physics in amazing accuracy – it is actually the most accurate theory in all of science! In this course we will focus on non-relativistic quantum mechanics, which is to quantum field theory as Newtonian physics is to special relativity. Quantum field theory is much more complicated, and is usually only taught at the graduate-school level.
2.1.4 The Stern-Gerlach Experiment
In the Stern-Gerlach experiment, electrically neutral particles, such as silver atoms, are sent through an inhomogeneous magnetic field and into a screen. For reasons we won’t go into (since they require some knowledge of electrodynamics), the magnetic field will deflect the particle up or down by an amount proportional to its angular momentum. According to classical physics, this angular momentum can have any value, and so we would expect to see the particles hit every possible point along a continuous line on the screen. This is item (4) in Figure 2.6 .
However, what actually happens when we perform the experiment is that the particles are deflected either up or down by the exact same amount each time, and hit only two specific discrete points on the screen. This is item (5) in Figure 2.6 . To explain this, we must again use quantum physics. Quantum particles are not seen as classically spinning objects; instead they are said to have an intrinsic form of angular momentum called spin. For particles like electrons or silver atoms, a measurement of spin can only yield one of two options: “spin up” or “spin down”.
The previous experiments we discussed showed us that something that is classically continuous – light, or more generally, electromagnetic radiation – is quantized in the quantum theory into discrete packets or quanta of energy called photons. Similarly, the Stern-Gerlach experiment tells us that another classically continuous thing, angular momentum, is also quantized in the quantum theory – into discrete spin. This seems to be a general property of most, but not all, quantum systems: something that in classical physics was continuous turns out to actually be discrete in quantum physics.
Finally, let me just mention that one can use spin to create qubits, or “quantum bits”, where “spin up” represents a value of 0 and “spin down” represents a value of 1. Because spin is a quantum quantity, it satisfies all of the weird properties of quantum mechanics that we will discuss later. By taking advantage of these quantum properties, we can potentially do calculations faster with a quantum computer that uses qubits compared to a classical computer that uses classical bits.
2.2 Quantum vs. Classical Mechanics
Let us now summarize, in a non-technical way, the most important features of quantum mechanics and how they differ from their classical-mechanical counterparts.
Quantum mechanics is, as far as we know, the exact and fundamental theory of reality. Classical mechanics turns out to be just an approximation to this theory. This means that, in general, all modern theories of physics must be quantum theories if they intend to be fundamental. One important exception to that rule is general relativity, which we do not yet know how to describe as a quantum theory; if we did, we would call that theory quantum gravity. However, this is usually not a problem, since general relativity is mostly needed only when describing huge things like planets, stars, galaxies, and so on, in which case we do not need quantum mechanics since we are within the realm of validity of the classical approximation. In fact, this leads us to the next property:
Quantum mechanics is the theory of the smallest things. This includes elementary particles, atoms, and molecules. Since all big things are made of small things, quantum mechanics also describes humans, planets, galaxies, and the whole universe. However, this is exactly where the classical limit comes in; when many small quantum systems make up one big system, classical mechanics generally turns out to be a good enough description for all practical purposes. This is similar to how relativity is always the correct way to describe physics, but at low velocities, much smaller than the speed of light, Newtonian physics is a good enough approximation.
Quantum mechanics usually involves discrete things. This is in contrast with classical mechanics, which usually involves continuous things. In fact, continuous classical things generally turn out to be made of discrete quantum things. We saw an example of this when we discussed how light – a continuous electromagnetic field – is actually made of discrete photons. Similarly, we saw that angular momentum, which is continuous in the classical theory, is replaced by discrete spin in the quantum theory.
Quantum mechanics is a probabilistic theory. Classical mechanics, on the other hand, is a deterministic theory. For example, in classical mechanics, given a particle’s exact position and momentum at any one time, we can (in principle) predict its position and momentum at any other time – with absolute certainty. However, in quantum mechanics, the most we can ever hope to know is the probability distribution to find the particle at a certain position or with a certain momentum. This is illustrated in Figure 2.7 .
Quantum mechanics allows for superposition of states. In classical mechanics, the state of a particle is simply given by the exact values of its position and momentum. In contrast, in quantum mechanics the particle can – in fact, usually must – be in a superposition of possible positions and momenta. Each one of the possibilities in the superposition has a probability assigned to it, and this is where the probability distribution in Figure 2.7 comes from.
Quantum mechanics features uncertainty in measurements. This is called the uncertainty principle. In classical mechanics, at least theoretically, we can precisely know both the position and momentum of the particle. However, in quantum mechanics, the more we know about the position, the less we know about the momentum – and vice versa. If the position probability distribution is narrow and concentrated at a certain region, meaning that there is low uncertainty in the position, then one can prove that the momentum probability distribution must be wide, meaning that there is high uncertainty in the momentum. The opposite is also true. This is again illustrated in Figure 2.7 .
Quantum mechanics has a stronger type of correlation called entanglement. Classical mechanics also allows for correlation. For example, let’s say I have two sealed envelopes with notes inside them, one with the number 0 and the other with the number 1. I give one to Alice and one to Bob. If Alice opens her envelope and sees the number 0, she can be sure that Bob has the envelope with the number 1, and vice versa. The results are clearly correlated. However, if we replace the notes with qubits – quantum bits which are in a superposition of 0 and 1 – then the envelopes are now correlated more strongly via quantum entanglement. We will discuss later in exactly what way quantum entanglement is stronger than classical correlation, but right now we will note that this fact is what gives quantum computers their power.
3 Mathematical Background
Quantum theory is the theoretical framework believed to describe all aspects of our universe at the most fundamental level. Mathematically, as we will see, it is relatively simple, although much more abstract than classical physics. However, conceptually, it is very hard to understand using the classical intuition we have from our daily lives. In these lectures we will learn to develop quantum intuition.
In this chapter we shall learn some basic mathematical concepts, focusing on complex numbers, linear algebra, and probability theory, which will be used extensively throughout the course. Even if the student is already familiar with these concepts, it is still a good idea to go over this chapter, since the unique notation commonly used in quantum mechanics is different than the notation used elsewhere in mathematics and physics.
3.1 Complex Numbers
Complex numbers are at the very core of the mathematical formulation of quantum theory. In this section we will give a review of complex numbers and present some definitions and results that will be used throughout the course.
In real life, we only encounter real numbers. These numbers form a field, that is, a set of elements with well-defined operations of addition, subtraction, multiplication, and division. This field is denoted . Geometrically, we can imagine as a 1-dimensional line, stretching from to .
Unfortunately, it turns out that the field of real numbers has a serious flaw. One can write down completely reasonable-looking quadratic equations, with only real coefficient, which nonetheless have no solutions in . Consider the most general quadratic equation:
One can easily prove (by completing the square) that there are two potential solutions, given by
Here, one solution corresponds to the choice and the other one to . However, the square root poses a problem, because the square of a real number is always non-negative 2 2 2 Here, means “for all”. :
The number (and existence) of real solutions is thus determined by the sign of the expression inside the square root, called the discriminant :
It would be very convenient (not to mention more elegant) to have a field of numbers that is algebraically closed, meaning that every non-constant polynomial (and in particular, a quadratic polynomial) with coefficients in the field has a root in the field.
Since the problem stems from the fact that no real number can square to a negative number, let us simply extend our field with just one number, the imaginary unit, denoted 3 3 3 We use non-italic font exclusively for in order to distinct it from , which will be used for labels and variables. Of course, it is usually a wise idea not to have both and in the same equation in the first place, but sometimes that is unavoidable. , whose sole purpose is to square to a negative number. The most natural choice is for to square to :
The new field created by extending with is the field of complex numbers, denoted . A general complex number is written
where is called the real part and is called the imaginary part, both real numbers.
Now, in the quadratic equation, having with a negative is no longer a problem, since the number squares to :
Therefore, we conclude that every quadratic equation has a solution in the field of complex numbers 4 4 4 Note that real numbers are a special case of complex numbers, so the two real roots are also two complex roots. :
: Two real roots
: One real root
: Two complex roots
As a matter of fact, this is a special case of the fundamental theorem of algebra: any polynomial of degree with complex coefficients 5 5 5 Again, real numbers are a special case of complex numbers, so the coefficients can be all real. has at least one, and at most , unique complex roots 6 6 6 Or equivalently, it has exactly not necessarily unique complex roots, accounting for possible degeneracy/multiplicity. For example, for the quadratic equation has two degenerate roots, or one root of multiplicity 2. . The quadratic equation corresponds to the case .
A. Solve the quadratic equation
B. Find the quadratic equation whose solutions are .
Above we saw that the equation with can either have two real solutions, one real solution, or two complex solutions that are conjugates of each other.
A. Imaginary numbers 7 7 7 Sometimes also called purely imaginary numbers. are numbers of the form for . What kind of equation has two imaginary solutions that are complex conjugates of each other?
B. What kind of equation has two imaginary solutions that are in general not complex conjugates of each other?
C. What kind of equation has two arbitrary complex solutions that are in general not complex conjugates of each other?
Note: In all of the above, don’t just find a specific equation that has this property – find a family of equations with arbitrary parameters of certain types.
3.1.2 Operations on Complex Numbers
Complex numbers can be added and multiplied with other complex numbers. There is really nothing special about these operations, except that it is customary to group the imaginary parts (i.e. anything that is a multiple of ) together and turn into in the final result:
Next, note that the two solutions to a quadratic equation with are the same, up to the sign of . That is, if we replace with in one of the solutions, we get the other solution. Such numbers are called complex conjugates, and the process of replacing with is called complex conjugation. The complex conjugate of is denoted :
Of course, the conjugate of the conjugate is the original number:
This means that the complex conjugation operation is an involution, that is, its own inverse.
Complex conjugation allows us to write a general formula for the real or imaginary parts of a complex number, denoted and respectively:
You can check that if then we get and , as expected.
What are the real and imaginary parts of ? What is its complex conjugate?
If a number is the complex conjugate of itself, can you say anything interesting about that number? What about if a number is minus the complex conjugate of itself?
3.1.3 The Complex Plane and Real 2-Vectors
Recall that the field of real numbers is geometrically a line. The space is an -dimensional space which is home to real -vectors , that is, ordered lists of real numbers of the form . In particular, is geometrically a plane, with vectors of the form .
The complex plane is similar to , except that instead of the and axes we have the real and imaginary axes respectively. The real unit , which squares to , defines the positive direction of the real axis, while the imaginary unit , which squares to , defines the positive direction of the imaginary axis. This is illustrated in Figure 3.1 .
Since is a plane, we can define vectors on it, just like on . A real 2-vector is an arrow in which points from the origin to the point that is steps in the direction of the axis and steps in the direction of the axis. A complex number is similarly an arrow in which points from the origin to the point that is steps along the real axis and steps along the imaginary axis.
The complex conjugate is obtained by replacing with . Since defines the direction of the imaginary axis, this is equivalent to flipping the imaginary axis. In other words, is the reflection of along the real axis, as shown in Figure 3.1 .
From the Pythagorean theorem, we know that the magnitude (or length) of the real 2-vector is . The magnitude or absolute value of the complex number is also . (Inspect Figure 3.1 to see how the Pythagorean theorem fits in.) Furthermore, since is just a reflection of , they both have the same magnitude. A convenient way to calculate the magnitude of either or it to multiply them with each other:
For an abstract complex number (where we don’t necessarily know the explicit values of the real and imaginary parts) one can also write
We note that there is an isomorphism between complex numbers and real 2-vectors. An isomorphism between two spaces is a mapping between the spaces that can be taken in either direction (i.e. is invertible), and preserves the structure of each space. The isomorphism between and is given by:
We have already seen that the norm operation is preserved. Similarly, addition of complex numbers
maps into addition of 2-vectors
Let and .
A. Calculate , , , , , , , , and .
B. Find the 2-vectors isomorphic to and .
Show that multiplications of a vector by a real number and reflection of a vector with respect to the and axes map to equivalent operations on the corresponding complex numbers.
3.1.4 Polar Coordinates and Complex Phases
A vector in can be converted from Cartesian coordinates to polar coordinates . The coordinate is the magnitude of the vector, and the coordinate is the angle that the vector makes with respect to the axis. The relation between the coordinate systems is given by
This simply follows from the definitions of and , since the vector creates a right triangle with the axis (see Figure 3.1 ). For example, the vector in Cartesian coordinates corresponds to and .
and can be any real numbers, but must be non-negative and must be in the range (in radians) where corresponds to the axis. However, there is a subtlety here: the range of the function is , so needs to be further adjusted according to the quadrant. One can instead use a more complicated definition that automatically takes the quadrant into account:
This function is sometimes called , and it is implemented in most programming languages. Note that is undefined at the origin since a vector of length zero does not point in any direction.
Given that complex numbers are isomorphic to real 2-vectors, we should be able to write complex numbers in polar coordinates as well. Looking at ( 3.3 ), and replacing and with and , we see that
We can write this more compactly using Euler’s formula:
This is illustrated in Figure 3.1 . In this context, the angle is called the complex phase. It is of extreme importance in quantum mechanics, as we shall see.
Write in polar coordinates.
Prove, using Euler’s formula, that , that is, the magnitude of the complex number is . If , what is ?
Prove Euler’s formula. (You may need to use some calculus.)
3.2 Linear Algebra
The most important and fundamental mathematical structure in quantum theory is the Hilbert space, a type of complex vector space. In this section we will define Hilbert spaces and learn about many important concept and results from linear algebra that apply to them.
3.2.1 Complex Vector Spaces
A real -vector is an ordered list of real numbers. Analogously, a complex -vector is an ordered list of complex numbers. For example, a complex 2-vector with two complex components and is written as:
The notation is unique to quantum mechanics, and it is called bra-ket notation or sometimes Dirac notation. In this notation, we write a straight line and an angle bracket , and between them, a label. We will usually denote a general vector with the label ; this label, and its lowercase counterpart , are very commonly used in quantum mechanics. However, we can use whatever label we want to describe our vector – including letters, numbers, symbols, or even whole words and sentences, for example: , , , , , and so on.
This is a great advantage of the bra-ket notation, as it allows us to be very descriptive in the labels we choose for our vectors – which we can’t do with the notation or commonly used for vectors in mathematics and physics.
A vector space over a field 8 8 8 The field is usually taken to be or . Naturally, for a complex vector space, it will be . is a set of vectors equipped with two operations: addition of vectors and multiplication of vector by scalar, where a scalar is any number from the field . Vector addition must satisfy the following conditions:
Closed – the sum of two vectors is another vector in the same space:
Commutative – the order of vectors doesn’t matter:
Associative – if three vectors are added, it doesn’t matter which two are added first:
Identity vector or zero vector – there is a (unique) vector 9 9 9 Note that here we are using a slight abuse of notation by denoting the zero vector as the number , instead of using bra-ket notation. The reason is that already has a special common meaning in quantum mechanics, as we will see later; in the context of that special meaning, is not the zero vector. which, when added to any vector, does not change it:
Inverse vector – for every vector there exists another (unique) vector such that the two vectors sum to the zero vector:
Furthermore, multiplication by a scalar must satisfy the following conditions:
Closed – the product of a vector and a scalar is a vector in the same space:
Associative – if two scalars are multiplied by a vector, it doesn’t matter whether we first multiply the two scalars or we first multiply one of the scalars with the vector:
Distributive over addition of scalars:
Distributive over addition of vectors:
Identity scalar or unit scalar – there is a (unique) scalar which, when multiplied by any vector, does not change it:
We now define a -dimensional complex vector space, which we denote , as the space of complex -vectors over , with addition of vectors given by
and multiplication of vector by scalar given by
The -dimensional complex vector space is defined analogously. In this course, we will mostly focus on for simplicity, in particular when giving explicit examples.
Check that the addition and multiplication as defined above indeed satisfy all of the required conditions for a vector space. You can do this just for , for simplicity.
3.2.2 Dual Vectors, Inner Products, Norms, and Hilbert Spaces
A dual vector is defined by writing the vector as a row instead of a column, and replacing each component with its complex conjugate. We denote the dual vector of as follows:
In terms of notation, there is now an opposite angle bracket on the left of the label, and the straight line is on the right. Addition and multiplication by a scalar are defined as for vectors, simply replacing columns with rows. However, you may not add vectors and dual vectors together – adding a row to a column is undefined!
If we are given a dual vector, we can take its dual to get a “normal” (column) vector. In this case, the operation of taking the dual involves writing the vector as a column instead of a row and taking the complex conjugates of the components. This means that the operation of taking the dual is an involution – taking the dual of a vector twice gives back the same vector, since .
Using dual vectors, we may define the inner product. This product allows us to take a vector and a dual vector and produce a (complex) number out of them, similarly to the dot product of real vectors 10 10 10 The dot product of the real vectors and in is defined as . In principle, this definition does secretly involve a dual (row) vector and a (column) vector, but since we do not need to take the complex conjugate, we don’t really need to worry about dual vectors. However, it is important to note that in real vector spaces with curvature, such as those used in general relativity, the dot product must be replaced with a more complicated inner product which involves the metric, and it again becomes crucial to distinguish vectors from dual vectors – which in this context are also called contravariant and covariant vectors respectively. . Importantly, the inner product only works for one vector and one dual vector, not for two vectors or two dual vectors. To calculate it, we multiply the components of both vectors one by one and add them up:
In bra-ket notation, vectors are called “kets” and dual vectors are called “bras”. Then the notation for is called a “bra(c)ket”.
We define the norm-squared of a vector by taking its inner product with its dual (“squaring” it):
where the magnitude-squared of a complex number was defined in Section 3.1.3 as . Then we can define the norm as the square root of the norm-squared:
Observe how taking the dual of a vector generalizes taking the complex conjugate of a number, and taking the norm of a vector generalizes taking the magnitude of a number; indeed, for 1-dimensional vectors, these operations are the same!
A vector space with an inner product is called a Hilbert space, provided it is also a complete metric space 11 11 11 A vector space is a complete metric space if whenever an infinite series of vectors converges absolutely, that is, the series of the norms of the vectors converges: then the series of the vectors themselves converges as well, to some vector in the Hilbert space: and that the inner product satisfies the same properties (which you will derive in problems 3.13 , 3.14 , and 3.15 ) as the standard inner product on . In particular, itself is a Hilbert space, but there are many other Hilbert spaces, some of them much more abstract. The usual notation for a general Hilbert space is .
Calculate , , , , , and .
Prove that the norm-squared is always non-negative, and it is zero if and only if is the zero vector, that is, the vector whose components are all zero. In other words, the inner product is positive-definite. As a corollary, explain why we must take the complex conjugate of the components when we convert a vector to a dual vector. (What would have happened if we didn’t?)
Prove that , that is, if we swap the order of vectors in the inner product we get the complex conjugate of the original product. Thus, unlike the dot product, the inner product on is not symmetric. However, it is conjugate-symmetric, and in particular, the magnitude of the inner product remains the same, since .
Prove that if and then
that is, the inner product is linear in its second argument.
3.2.3 Orthonormal Bases
An orthonormal basis of is a set of non-zero vectors – which we will usually denote for short, with the implication that – such that:
They span , which means that any vector can be written uniquely as a linear combination of the basis vectors, that is, a sum of the vectors multiplied by some complex numbers :
This property ensures that the basis can be used to define any single vector in the space , not just part of that space.
As a simple example, in the vector pointing along the axis and the vector pointing along the axis span the plane, but not all of . To get a basis for all of , we must add an appropriate third vector, such as the vector pointing along the axis. (But other vectors, such as , would work as well.)
They are linearly independent, in that if the zero vector is a linear combination of the basis vectors, then the coefficients in the linear combination must all be zero:
Linear independence means (as you will show in Problem 3.17 ) that no vector in the set can be written as a linear combination of the other vectors in the set. If we could have done so, then that vector would have been redundant, and we would have needed to remove it in order to obtain a basis.
As a simple example, the set composed of , , and is linearly dependent, since , but the set is linearly independent.
They are all orthogonal to each other, that is, the inner product of any two different vectors evaluates to zero:
They are all unit vectors, that is, they have a norm (and norm-squared) of 1:
In fact, properties 3 and 4 may be expressed more compactly as:
where is called the Kronecker delta. If this combined property is satisfied, we say that the vectors are orthonormal 12 12 12 Actually, bases don’t have to be orthonormal in general, but in quantum mechanics they always are, for reasons that will become clear later. .
These requirements become much simpler in dimensions. An orthonormal basis for is a set of 2 non-zero vectors such that:
They span , which means that any vector can be written as a linear combination of the basis vectors:
for a unique choice of .
They are linearly independent, which means that we cannot write one in terms of a scalar times the other, i.e.:
They are orthonormal to each other, that is, the inner product between them evaluates to zero and both of them have unit norm:
A very important basis, the standard basis of , is defined as:
We similarly define the standard basis of for any in the obvious way.
Show that the standard basis vectors satisfy the properties above.
Show that linear independence means that no vector in the basis can be written as a linear combination of the other vectors in the basis.
Any basis which is orthogonal but not orthonormal, that is, does not satisfy property 4, can be made orthonormal by normalizing each basis vector, that is, dividing it by its norm:
Show that if an orthogonal but not orthonormal basis satisfies properties 1-3, then it still satisfies them after normalizing it in this way.
Consider the complex vector
Normalize and find another complex vector such that the set is a basis of (i.e. satisfies all of the properties above).
Find an orthonormal basis of which is not the standard basis or a scalar multiple of the standard basis. Show that it is indeed an orthonormal basis.
3.2.4 Matrices and the Adjoint
A matrix in dimensions is an array 13 13 13 In fact, matrices don’t have to be square, they can have a different number of rows and columns, that is, where ; but non-square matrices are generally not of much interest in quantum mechanics. of (complex) numbers. In dimensions we have
A matrix can act on a vector to produce another vector. If it’s a ket (a vertical/column vector), the result is another ket. If it’s a bra (a horizontal/row dual vector), the result is another bra.
If the matrix acts on a ket, then it must act from the left, and the element at row of the resulting ket is obtained by taking the inner product of row of the matrix with the ket:
If the matrix acts on a bra, then it must act from the right, and the element at column of the resulting bra is obtained by taking the inner product of column of the matrix with the bra:
Note that the dual vector is not the dual of the vector , as you can see by taking the dual of ( 3.5 ). However, we can define the adjoint of a matrix by transposing rows into columns and then taking the complex conjugate of all the components:
where the notation for the adjoint is called dagger. Then the vector dual to is , as you will check in Problem 3.22 . Actually, taking the adjoint of a matrix is exactly the same operation as taking the dual of a vector! The only difference is that for a matrix we have columns to transpose into rows, while for a vector we only have one. Therefore, we have
and we get the following nice relation:
The identity matrix, which we will write simply as 1, is:
Acting with it on any vector or dual vector does not change it: .
To rotate (real) vectors in by an angle , we take their product with the (real) rotation matrix:
A. Calculate the matrix .
B. Write down the vector resulting from rotating by radians, in both Cartesian and polar coordinates.
C. Repeat (B) for rotating a general 2-vector by a general angle .
D. Find the mapping between rotations of 2-vectors in and rotations of complex numbers in , and explain what is the analogue of the rotation matrix in terms of complex numbers.
Show that the vector dual to is indeed .
Calculate and separately, and then check that they are the dual of each other.
Show that . This means that the adjoint operation is an involution, exactly like complex conjugation and taking the dual of a vector. In fact, all three are the exact same operation. By choosing an appropriate matrix, explain how taking the complex conjugate of a number is a special case of taking the adjoint of a matrix.
Show that the action of a matrix on a vector is linear, that is,
3.2.5 The Outer Product
We have seen that vectors and dual vectors may be combined to generate a complex number using the inner product. We can similarly combine a vector and a dual vector to generate a matrix, using the outer product. Given
we define the outer product as the matrix whose component at row , column is given by multiplying the component at row of with the component at column of :
Note how when taking an inner product the straight lines face each other: , while when taking an outer product the angle brackets face each other. This shows some of the elegance of the Dirac notation! A bra-ket is an inner product, while a ket-bra is an outer product.
We can assign a rank to scalars, vectors, and matrices:
Scalars have rank 0 since they have component,
Vectors have rank 1 since they have components,
Matrices have rank 2 since they have components.
Then the inner product reduces the rank of the vectors from 1 to 0, while the outer product increases the rank from 1 to 2.
Calculate the outer product for
Remember that when writing the dual vector, the components are complex conjugated!
3.2.6 The Completeness Relation
Let us write the vector as a linear combination of basis vectors:
Taking the inner product of the above equation with and using the fact that the basis vectors are orthonormal,
since all of the terms in the sum vanish except the one with . Therefore, the coefficients in ( 3.7 ) are given, for any vector and for any basis , by
Now, since is a scalar, and multiplication by a scalar is commutative (unlike the inner and outer products!), we can move it to the right in ( 3.7 ):
We haven’t actually done anything here; where to write the scalar, on the left or right of the vector, is completely arbitrary – it’s just conventional to write it on the left. Then, replacing with as per ( 3.8 ), we get
To make this even more suggestive, let us add parentheses:
Note that what we did here is go from a vector times a complex number to a matrix times a vector , for each . The fact that these two different products are actually equal to one another (as you will prove in Problem 3.28 ) is not at all trivial, but it is one of the main reasons we like to use bra-ket notation! The notation now suggests (see Problem 3.29 ) that
where is the outer product defined above, and the 1 on the right-hand side is the identity matrix. This extremely useful result is called the completeness relation.
In , we simply have
Given the basis
first show that it is indeed an orthonormal basis, and then show that it satisfies the completeness relation given by ( 3.11 ).
Provide a rigorous proof that
This means that the product is associative.
3.2.7 Representing Vectors in Different Bases
Let us consider a complex -vector defined as follows:
Given an orthonormal basis , we have seen that we can write as a linear combination of the basis vectors:
The coefficients depend on and on the basis vectors, as we showed in ( 3.8 ):
With these coefficients, we can represent the vector in the basis . This representation will be a vector of the same dimension , with the components being the coefficients , and will be denoted as follows:
We say that are the coordinates of with respect to the basis .
The correct way to understand the meaning of a vector is as an abstract entity, like an arrow in space, which does not depend on any particular basis – it is just there. However, if we want to do concrete calculations with a vector, we must somehow represent it numerically. This is done by choosing a basis and writing down the coordinates of the vector in that basis.
Therefore, whenever we define a vector using its components – as we have been doing throughout this chapter – there is always a specific basis in which the vector is represented, with the components being the coordinates in this basis. If no particular basis is explicitly specified, it is implied that it is the standard basis. But no representation is better than the other; we usually choose whatever basis is most convenient to work with. In quantum mechanics, we often choose a basis defined by some physical observable, as we will see below.
Let a vector be represented in the standard basis as
Find its representation in terms of the orthonormal basis
Prove that the inner product (and thus also the norm) is independent of the choice of basis. That is, for any two vectors and and any two bases and ,
3.2.8 Change of Basis
Let the representation of a vector in the basis be
Given a different basis , we have a different representation