# Try the code yourself!

Click the following button to launch an ipython notebook on Google Colab which implements the code developed in this post:

**The Problem**

What is the easiest way to parallelize a for loop in Python?

**The Solution**

Using the `joblib`

library. Consider the following, non-parallelized example where we generate a list of integers and calculate their factorials. There are better ways to do this using vectorised operations in `numpy`

but this is just used as a demonstration.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

import time
import math
def compute_factorial(n: int) -> int:
"""
Compute a factorial of a provided integer
Args:
n (int): The integer to calculate the factorial of
Returns:
int: The calculated factorial
"""
return math.factorial(n)
# list of numbers
numbers = [i for i in range(1, 100000)]
# measure time taken to compute factorials sequentially
start_time = time.time()
factorials_sequential = [compute_factorial(number) for number in numbers]
end_time = time.time()
print(f"Sequential computation took {end_time - start_time:.4f} seconds")

Now we run the same function but this time we split the list into ‘chunks’, each of which is processed on a different CPU core in parallel to speed up their calculation using the `joblib`

library.

1
2
3
4
5
6
7
8

from joblib import Parallel, delayed, cpu_count
# measure time taken to compute factorials in parallel
start_time = time.time()
factorials_parallel = Parallel(n_jobs=cpu_count())(delayed(compute_factorial)(number) for number in numbers)
end_time = time.time()
print(f"Parallel computation took {end_time - start_time:.4f} seconds")

That’s it! Bonus being that it is a one-liner too.

A description of the syntax for the parallel code is as follows:

`delayed`

:`delayed(compute_factorial)(number)`

wraps the`compute_factorial`

function call, marking it for delayed execution.- This means the function call and its argument (
`number`

) are not executed immediately but are instead prepared to be dispatched to a worker process or thread when`Parallel`

runs.

`Parallel`

:`Parallel(n_jobs=cpu_count())`

initializes a parallel computing environment with as many workers as there are CPU cores available (`n_jobs=cpu_count()`

).- The list comprehension
`[delayed(compute_factorial)(number) for number in numbers]`

generates a list of delayed function calls. - Parallel takes this list of delayed calls and executes them in parallel, distributing the work across the available CPU cores.
- The result is a list of computed factorials, executed in parallel.

On my MacBook the results are:

- Sequential = 81.4 seconds
- Parallel = 10.4 seconds

~ 8 times faster. There are seemingly a million ways to parallelize things in Python, some of which are not true parallelization as they are still bound by the global interpreter lock, GIL. I have always found `joblib`

the easiest way to parallelise a `for`

loop.