Chris Conlan

Financial Data Scientist

  • About
  • Blog
    • Business Management
    • Programming with Python
    • Programming with R
    • Automated Trading
    • 3D Technology and Virtual Reality
  • Books
    • The Financial Data Playbook
    • Fast Python
    • Algorithmic Trading with Python
    • The Blender Python API
    • Automated Trading with R
  • Snippets

Fastest Way to Flatten a List in Python

June 5, 2020 By Chris Conlan 23 Comments

In my latest book, Fast Python, I bust a lot of speed myths in Python. Some of the most basic advice people give about optimizing Python code is wrong, likely because no one ever bothered to test it. In this book, I go back to the basics and verify (or bust) all of the advice and assumed knowledge that we share as a programming community.

In this post, I’ll show you five ways to flatten a list in Python, and then show you the test results. I consider these results to be a myth-buster, because they are not at all what I expected based on what we consider best practice in the Python community.

The methods

# Use long-form list addition
def slow_add_flatten_lists(the_lists):
    result = []
    for _list in the_lists:
        result = result + _list
    return result

# Use list addition with the += operator
def add_flatten_lists(the_lists)
    result = []
    for _list in the_lists:
        result += _list
    return result

# Use List.extend
def extend_flatten_lists(the_lists):
    result = []
    for _list in the_lists:
        result.extend(_list)
    return result

# Use List.append
def append_flatten_lists(the_lists)
    result = []
    for _list in the_lists:
        for val in _list:
            result.append(val)
    return result

# Use a nested list comprehension
def comprehension_flatten_lists(the_lists)
    return [val for _list in the_lists for val in _list]

Most people would expect the third or fifth option to be fastest, because they are the most concise and best utilize Python’s existing data structures to achieve the desired outcome, but this is not the case. For some reason, the second method, using list addition with the += operator, is the fastest by a large margin. Here are the profiles.

Most people assume that the += operator on lists must be slow, because a += b is likely just syntactical sugar for a = a + b, which is provably slow. This isn’t the case. The += operator is optimized in some other ways for Python lists, as is proven by the code profiles, and it is not at all equivalent to a = a + b. Surprisingly enough, it is not only optimized, but it ends up being the fastest known method the flatten a list.

Check out Fast Python if you are interested in busting more speed myths and learning how to optimize your Python code. Writing this book has been very helpful to my own programming, because it has given me access to empirical evidence for my speed-based design decisions.

Filed Under: Programming with Python, Snippets

Comments

  1. Terry says

    June 9, 2020 at 8:38 pm

    What about itertools.chain.from_iterable?

    Reply
    • Chris Conlan says

      June 9, 2020 at 8:45 pm

      Someone mentioned that before. Does that compress a list in one shot? How does it work? I will likely include it in the repo.

      Reply
      • Vinny Murphy says

        June 9, 2020 at 10:58 pm

        lol = [‘a’, [‘a’, ‘b’, ‘c’], range(1,22)]
        lol
        import itertools as it
        list(it.chain.from_iterable(lol))

        Reply
        • Terry says

          June 10, 2020 at 12:02 am

          or [*it.chain.from_iterable(lol)], since the list() is slower than [].

          Reply
      • Tin Tvrtković (@tintvrtkovic) says

        June 9, 2020 at 11:15 pm

        https://twitter.com/tintvrtkovic/status/1248995617256374273

        I didn’t think to try += though, it seems quadratic on the face of it.

        Reply
      • Tin Tvrtković (@tintvrtkovic) says

        June 9, 2020 at 11:16 pm

        https://twitter.com/tintvrtkovic/status/1248995617256374273

        I didn’t think to try +=, it seemed quadratic on the face of it…

        Reply
    • Reece Hart says

      June 9, 2020 at 11:57 pm

      What Terry said.

      list_of_lists = [[f”{l}{i}” for i in range(3)] for l in “ABC”]
      list_of_lists
      [[‘A0’, ‘A1’, ‘A2’], [‘B0’, ‘B1’, ‘B2’], [‘C0’, ‘C1’, ‘C2’]]

      import itertools
      list(itertools.chain.from_iterable(list_of_lists)
      [‘A0’, ‘A1’, ‘A2’, ‘B0’, ‘B1’, ‘B2’, ‘C0’, ‘C1’, ‘C2’]

      I didn’t time it, but I’ll be surprised if it’s slower.

      Also, unlike all of the examples used, this method uses a generator and would be well-suited to streaming very large inputs with minimal memory use. For example:

      flat_list_generator = itertools.chain.from_iterable(list_of_lists)
      next(flat_list_generator)
      ‘A0’
      next(flat_list_generator)
      ‘A1’
      next(flat_list_generator)
      ‘A2’

      Reply
      • Chris Conlan says

        June 10, 2020 at 1:22 am

        I tried the itertools variant. It is exactly the same speed at the List.extend variant. So, still not the fastest. I’ve added it to the GitHub repo: https://github.com/chrisconlan/fast-python/blob/master/src/flatten_lists.py

        Reply
        • Terry says

          June 10, 2020 at 1:30 am

          What about [*] instead of list()?

          Reply
          • Chris Conlan says

            June 10, 2020 at 1:50 am

            Same performance. Doesn’t affect it

    • Chris Conlan says

      June 10, 2020 at 1:25 am

      I tested it. See my response to Reese.

      Reply
  2. Egert says

    June 10, 2020 at 11:29 am

    Has anyone tried the following:

    list = sum(list, [])

    Quick tests on Python 3.8 with timeit and a short list show it the fastest, don’t know how it scales though. Got the snippet from StackOverflow (https://stackoverflow.com/questions/50082861/how-to-convert-a-magic-square-into-a-proper-magic-square-in-python-3)

    Reply
  3. Ruud van der Ham says

    June 10, 2020 at 2:05 pm

    There’s another fast way to flatten a list:

    def reduce_flatten_list(
    the_lists: List[List[str]]) -> List[str]:
    “””
    Flatten a list of lists via functools.reduce
    “””
    return list(functools.reduce(operator.iconcat, the_lists, []))

    I benchmarked this and (on my machine) this is the fastest up to input length 10^5. Thereafter it’s nearly as fast the add_flatten_lists.
    I also noticed that my graphs are not consistent with yours (due to other processor, OS, …?)

    Reply
    • Chris Conlan says

      June 10, 2020 at 2:07 pm

      It is very possible your profiles are different. I used whatever Python version ships with Anaconda, which might have OS-level or byte-compiler-level optimizations that others don’t.

      One of the big lessons here is that we don’t know as Python programmers which C-level optimizations are in-play at any given time, so we just have to profile to find out.

      Reply
      • Ruud van der Ham says

        June 10, 2020 at 2:09 pm

        What do you think of the quite different approach with reduce?

        Reply
        • Chris Conlan says

          June 10, 2020 at 2:14 pm

          I haven’t tried it. I encourage you to download the repo and consider contributing if you think your example is constructive: https://github.com/chrisconlan/fast-python

          Reply
          • Ruud van der Ham says

            June 10, 2020 at 2:27 pm

            I think it’s easier if you just add it yourself.

  4. Garry Offord says

    June 12, 2020 at 1:31 pm

    I’ve been using this function to flatten an arbitrary deep nested list:

    def flatten(arg: list, sort_results=False) -> list:
        """
        This function takes an object graph (really, just a list of lists/tuples of
        lists/tuples etc) and flattens it into a one-dimensional list.
        """
    
        def helper(result, arg):
            if isinstance(arg, list):
                for x in arg:
                    helper(result, x)
            elif isinstance(arg, tuple):
                for x in arg:
                    helper(result, x)
            elif isinstance(arg, set):
                for x in sorted(arg):
                    helper(result, x)
            else:
                result.append(arg)
            return result
    
        r = helper(list(), arg)
        return sorted(r) if sort_results else r
    
    Reply
  5. Tragen says

    October 11, 2020 at 6:02 pm

    what a about

    sum(the_lists, [])
    
    Reply
  6. Isaias says

    November 4, 2020 at 11:09 pm

    Indeed, I’ve checked that add_flatten_lists is faster even compared to :

    import functools, operator
    def functools_flatten(list_of_lists: Iterable[Iterable]):
    return functools.reduce(operator.iconcat, list_of_lists, [])

    Which was the fastest approach for python2

    Results for list of n list with variable number of up to 5 inner elements.
    n add_flatten_lists functools_flatten
    1 3.024376928806305e-07 5.057640373706817e-07
    10 6.841495633125305e-07 9.091570973396302e-07
    100 4.355348646640778e-06 5.3446926176548e-06
    1000 4.190962761640549e-05 4.575522616505623e-05
    10000 0.00041823429986834524 0.00044615136459469796
    100000 0.004164023883640766 0.004406414236873388
    1000000 0.04450144730508328 0.04704729903489351
    10000000 0.5981661455519497 0.6272326799668372

    Reply
  7. Drew Grant says

    August 13, 2021 at 7:43 pm

    Thanks for this. I’m dealing with multidimensional lists with over a million rows once flattened. Flattening them has been quite the time-consuming challenge but the extend_flatten_lists() function you provided is exceptionally fast and computationally efficient.

    Reply
  8. Hemerson Tacon says

    November 23, 2021 at 4:31 pm

    This version of extend beats the += one:

    def new_extend_flatten_lists(the_lists):
        result = []
        extend = result.extend
        for _list in the_lists:
            extend(_list)
        return result
    
    Reply
  9. Hemerson Tacon says

    November 23, 2021 at 4:35 pm

    def new_extend_flatten_lists(the_lists):
    result = []
    extend = result.extend
    for _list in the_lists:
    extend(_list)
    return result

    Reply

Leave a Reply Cancel reply

Fast Python: Master the Basics to Write Faster Code

Fast Python: Master the Basics to Write Faster Code

Available for purchase at Amazon.com.

Featured Posts: Python Programming

Pulling All Sorts of Financial Data in Python [Updated for 2021]

When to Use Heap Sort to Speed Up your Python Code

70+ Code Profiles of Common Python Algorithms

Topics

  • 3D Technology and Virtual Reality (8)
  • Automated Trading (9)
  • Business Management (9)
  • Chris Conlan Blog (5)
  • Computer Vision (2)
  • Programming with Python (16)
  • Programming with R (6)
  • Snippets (8)
  • Email
  • LinkedIn
  • RSS
  • YouTube

Copyright © 2022 · Enterprise Pro Theme On Log in