Chris Conlan

Financial Data Scientist

  • About
  • Blog
    • Business Management
    • Programming with Python
    • Programming with R
    • Automated Trading
    • 3D Technology and Virtual Reality
  • Books
    • The Financial Data Playbook
    • Fast Python
    • Algorithmic Trading with Python
    • The Blender Python API
    • Automated Trading with R
  • Snippets

Review: Multiple View Geometry in Computer Vision

June 26, 2017 By Chris Conlan Leave a Comment

When searching for information about computer vision, 3D reconstruction, OpenCV projects, and robotics, this book is always recommended. There is a reason this book is recommended in so many contexts.

Multiple View Geometry in Computer Vision (Hartley and Zisserman 2004) is a highly organized foray into computer vision literature. It is an excellent reference text for the right audience with the right background. We will discuss the requirements and characteristics of the book below.

Great Topical Organization

There are hundreds of academic papers on computer vision that all aim to solve a problem of the form: Given this data, how to I discover this characteristic? That may sound simple, but the problem has numerous angles in computer vision. For example:

  • Given two, three, or N pictures, how can I estimate the 3D structure of an object?
  • Given N pictures with M points marked across them, how can I estimate the position of the camera in each picture?
  • How can I do this with no points marked? What are best ways to estimate and match these points?
  • What if my surface is planar? How can I take advantage of this?
  • What if I know the camera parameters ahead of time? How can I take advantage of this?

In other words, there are an infinite number of ways to formulate the core problem of computer vision: Can I discover this, using that?

The second edition of this book was published in 2004, well into the advent of computer vision technology and theory. There are tons of papers published in all sorts of domains on this topic, and there is a ton of overlap in many sectors of academia. Between all of the research published by mathematicians, physicists, robotics engineers, and game developers, there is an overwhelming variety of approach and notation. This book overcomes the problem of scattered and overlapping research by organizing all of it, using the same vocabulary and mathematical notation throughout. This is incredibly helpful to a newcomer to computer vision.

3D rendering researchers write Quaternions one way, and robotics engineers write them the opposite way. It is easy to see how uniform notation can be useful when you don’t even know what a Quaternion is.

Great Algorithms and Recipes

This book has stood the test of time because it didn’t latch on to a single programming language to illustrate its examples. Instead, it gives very clear cookbook-style pseudo-code for most of the algorithms you will learn about. In practice, I spent 30% of my time reading the chapters, and 70% of my time in the appendix implementing pseudo-code in my language of choice. The prevalence and ubiquity of matrix libraries in most programming languages makes this entirely feasible.

Ubiquity in the Industry

This book is referenced all the time, and not just in academia. The source code of the Calib3d module in OpenCV references this text (by chapter and section number) about a dozen times in the comments. Anyone using OpenCV should own this book. Most OpenCV users consider the core developers to be magicians. If they really are magicians, this is their book of secrets.

Great Mathematical Notation

I love good math. Good math describes complex problems in the simplest way as to not sacrifice detail. I know this book strikes that balance, because I use the mathematical style of this text to describe computer vision and rendering problems in my day-to-day work. When I put pen to paper, I use the same subscripts, squiggly lines, type font, and capitalization as is used in this text. It just works.

Advanced Prerequisites

This book is pleasant, but not easy, to read. There are fairly high barriers to entry. The my list based on my reading:

  1. A solid understanding of matrix algebra. This book assumes the reader is comfortable with advanced concepts in matrix algebra, including block-matrix operations and matrix calculus.
  2. A familiarity with calculus and numerical optimization. The derivations in the first half of the text don’t depend on these skills, but programming most of the examples will require them.
  3. A familiarity with low-level computing. This book assumes familiarity of low-level computing concepts, including floating point precision and expressions of computational complexity.
  4. A familiarity with the pinhole camera model. Chapter 6 gives a good introduction to it, but I would not consider the discussion to be sufficiently detailed for someone new to computer vision.
  5. A programming language to experiment with. The author’s tool of choice is MATLAB. The author’s website host some high-quality MATLAB code samples pertaining to the examples in the book. Personally, I used R and Python+Numpy to perform experiments. The math in the text can’t be done on a calculator, so you’ll need familiarity with a language with matrix libraries to take advantage it.

Conclusion

Every industry has books that compile research from disparate academic sources. As is mentioned above, this book does an exceptionally good job at standardizing all of that information into a uniform notation and theory. This trait alone is enough to make it stand out from most academic surveys.

I recommend getting the eBook and keeping it on your Kindle or tablet. It is helpful to have available at a moment’s notice.

Filed Under: 3D Technology and Virtual Reality, Computer Vision

Leave a Reply Cancel reply

Latest Release: 3D Modeling

The Blender Python API by Chris Conlan

Available for purchase at Amazon.com.

Featured: Business Management

Alteryx for Good Teams Up with University of Virginia to Support New Class: STAT 4559

Programming topics by programming language in open-source repos of big companies

The Best Language for XYZ: Answered Empirically

Gilmer Claire UVA Biology Department

TPM for Biologists: My Experience with a UVA Bio Lab

Topics

  • 3D Technology and Virtual Reality (8)
  • Automated Trading (9)
  • Business Management (9)
  • Chris Conlan Blog (5)
  • Computer Vision (2)
  • Programming with Python (16)
  • Programming with R (6)
  • Snippets (8)
  • Email
  • LinkedIn
  • RSS
  • YouTube

Copyright © 2022 · Enterprise Pro Theme On Log in