Member-only story
Blazing Fast ETLs with Simultaneous MultiProcessing and MultiThreading
Published in
6 min readMay 27, 2023
How I got a 66.4x reduction in code execution time

MultiProcessing vs MultiThreading
In Python, multiprocessing and multithreading are both techniques used to achieve concurrent execution of code, but they differ in how they achieve parallelism and handle system resources. Here are the main differences between multiprocessing and multithreading:
- Parallelism: Multiprocessing enables parallelism by utilizing multiple processes, each running in its own separate memory space. These processes can run on different CPU cores, allowing for true parallel execution. On the other hand, multithreading achieves parallelism within a single process by dividing the work into multiple threads, which share the same memory space and can run concurrently. However, due to Python’s Global Interpreter Lock (GIL), which allows only one thread to execute Python bytecode at a time, multithreading in Python is not suitable for CPU-bound tasks and may not fully utilize multiple CPU cores.
- Memory: In multiprocessing, each process has its own memory space, which means they do not share memory by default. If data needs to be shared between processes, it requires explicit communication mechanisms like pipes, queues, or shared memory. In multithreading, threads share the same memory space, allowing them to access and modify the same data directly. However, this shared memory can lead to synchronization issues and require the use of synchronization primitives like locks to ensure data integrity.
- CPU-bound vs. I/O-bound tasks: Multiprocessing is well-suited for CPU-bound tasks, where parallelizing the computation across multiple cores can lead to significant speed improvements. This is because each process runs independently and can utilize a separate CPU core. On the other hand, multithreading is more suitable for I/O-bound tasks, such as network communication or file operations, where the threads can overlap waiting times and keep the CPU busy by performing other operations during I/O operations.
- Overhead: Multiprocessing typically incurs more overhead compared to multithreading due to the creation of separate processes, context switching, and inter-process communication mechanisms…