When searching for information about computer vision, 3D reconstruction, OpenCV projects, and robotics, this book is always recommended. There is a reason this book is recommended in so many contexts.
Multiple View Geometry in Computer Vision (Hartley and Zisserman 2004) is a highly organized foray into computer vision literature. It is an excellent reference text for the right audience with the right background. We will discuss the requirements and characteristics of the book below.
Great Topical Organization
There are hundreds of academic papers on computer vision that all aim to solve a problem of the form: Given this data, how to I discover this characteristic? That may sound simple, but the problem has numerous angles in computer vision. For example:
- Given two, three, or N pictures, how can I estimate the 3D structure of an object?
- Given N pictures with M points marked across them, how can I estimate the position of the camera in each picture?
- How can I do this with no points marked? What are best ways to estimate and match these points?
- What if my surface is planar? How can I take advantage of this?
- What if I know the camera parameters ahead of time? How can I take advantage of this?
In other words, there are an infinite number of ways to formulate the core problem of computer vision: Can I discover this, using that?
The second edition of this book was published in 2004, well into the advent of computer vision technology and theory. There are tons of papers published in all sorts of domains on this topic, and there is a ton of overlap in many sectors of academia. Between all of the research published by mathematicians, physicists, robotics engineers, and game developers, there is an overwhelming variety of approach and notation. This book overcomes the problem of scattered and overlapping research by organizing all of it, using the same vocabulary and mathematical notation throughout. This is incredibly helpful to a newcomer to computer vision.
3D rendering researchers write Quaternions one way, and robotics engineers write them the opposite way. It is easy to see how uniform notation can be useful when you don’t even know what a Quaternion is.
Great Algorithms and Recipes
This book has stood the test of time because it didn’t latch on to a single programming language to illustrate its examples. Instead, it gives very clear cookbook-style pseudo-code for most of the algorithms you will learn about. In practice, I spent 30% of my time reading the chapters, and 70% of my time in the appendix implementing pseudo-code in my language of choice. The prevalence and ubiquity of matrix libraries in most programming languages makes this entirely feasible.
Ubiquity in the Industry
This book is referenced all the time, and not just in academia. The source code of the Calib3d module in OpenCV references this text (by chapter and section number) about a dozen times in the comments. Anyone using OpenCV should own this book. Most OpenCV users consider the core developers to be magicians. If they really are magicians, this is their book of secrets.
Great Mathematical Notation
I love good math. Good math describes complex problems in the simplest way as to not sacrifice detail. I know this book strikes that balance, because I use the mathematical style of this text to describe computer vision and rendering problems in my day-to-day work. When I put pen to paper, I use the same subscripts, squiggly lines, type font, and capitalization as is used in this text. It just works.
Advanced Prerequisites
This book is pleasant, but not easy, to read. There are fairly high barriers to entry. The my list based on my reading:
- A solid understanding of matrix algebra. This book assumes the reader is comfortable with advanced concepts in matrix algebra, including block-matrix operations and matrix calculus.
- A familiarity with calculus and numerical optimization. The derivations in the first half of the text don’t depend on these skills, but programming most of the examples will require them.
- A familiarity with low-level computing. This book assumes familiarity of low-level computing concepts, including floating point precision and expressions of computational complexity.
- A familiarity with the pinhole camera model. Chapter 6 gives a good introduction to it, but I would not consider the discussion to be sufficiently detailed for someone new to computer vision.
- A programming language to experiment with. The author’s tool of choice is MATLAB. The author’s website host some high-quality MATLAB code samples pertaining to the examples in the book. Personally, I used R and Python+Numpy to perform experiments. The math in the text can’t be done on a calculator, so you’ll need familiarity with a language with matrix libraries to take advantage it.
Conclusion
Every industry has books that compile research from disparate academic sources. As is mentioned above, this book does an exceptionally good job at standardizing all of that information into a uniform notation and theory. This trait alone is enough to make it stand out from most academic surveys.
I recommend getting the eBook and keeping it on your Kindle or tablet. It is helpful to have available at a moment’s notice.
Leave a Reply