T>T: Effortless Parallel For Loops in Python
Post
Cancel

# Try the code yourself!

Click the following button to launch an ipython notebook on Google Colab which implements the code developed in this post:

The Problem

What is the easiest way to parallelize a for loop in Python?

The Solution

Using the joblib library. Consider the following, non-parallelized example where we generate a list of integers and calculate their factorials. There are better ways to do this using vectorised operations in numpy but this is just used as a demonstration.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 import time import math def compute_factorial(n: int) -> int: """ Compute a factorial of a provided integer Args: n (int): The integer to calculate the factorial of Returns: int: The calculated factorial """ return math.factorial(n) # list of numbers numbers = [i for i in range(1, 100000)] # measure time taken to compute factorials sequentially start_time = time.time() factorials_sequential = [compute_factorial(number) for number in numbers] end_time = time.time() print(f"Sequential computation took {end_time - start_time:.4f} seconds") 

Now we run the same function but this time we split the list into ‘chunks’, each of which is processed on a different CPU core in parallel to speed up their calculation using the joblib library.

1 2 3 4 5 6 7 8 from joblib import Parallel, delayed, cpu_count # measure time taken to compute factorials in parallel start_time = time.time() factorials_parallel = Parallel(n_jobs=cpu_count())(delayed(compute_factorial)(number) for number in numbers) end_time = time.time() print(f"Parallel computation took {end_time - start_time:.4f} seconds") 

That’s it! Bonus being that it is a one-liner too.

A description of the syntax for the parallel code is as follows:

1. delayed:
• delayed(compute_factorial)(number) wraps the compute_factorial function call, marking it for delayed execution.
• This means the function call and its argument (number) are not executed immediately but are instead prepared to be dispatched to a worker process or thread when Parallel runs.
2. Parallel:
• Parallel(n_jobs=cpu_count()) initializes a parallel computing environment with as many workers as there are CPU cores available (n_jobs=cpu_count()).
• The list comprehension [delayed(compute_factorial)(number) for number in numbers] generates a list of delayed function calls.
• Parallel takes this list of delayed calls and executes them in parallel, distributing the work across the available CPU cores.
• The result is a list of computed factorials, executed in parallel.

On my MacBook the results are:

• Sequential = 81.4 seconds
• Parallel = 10.4 seconds

~ 8 times faster. There are seemingly a million ways to parallelize things in Python, some of which are not true parallelization as they are still bound by the global interpreter lock, GIL. I have always found joblib the easiest way to parallelise a for loop.

This post is licensed under CC BY 4.0 by the author.