Python Multiprocessing
The multiprocessing module allows you to create and manage multiple processes, which would allow parallel execution of tasks. This is particularly useful for CPU-bound operations where threading may not be effective due to the Global Interpreter Lock (GIL) in Python. Using multiprocessing means that every process has its own Python interpreter and memory space and thus avoids the GIL and makes full use of multiple CPU cores.
Key Concepts of Multiprocessing
- Process: A process is an independent unit of execution with its memory space. In Python, the
Process
class from themultiprocessing
module represents an individual process. - Parallelism: Multiprocessing is the only approach that allows true parallelism, whereas threading may be limited by the GIL.
- Shared Memory: Processes do not share memory by default. However,
multiprocessing
provides tools likeValue
andArray
to share simple data structures. - Inter-Process Communication (IPC): Mechanisms like queues and pipes are used to allow process communication.
Core Components of the multiprocessing
Module
Process
Class
- Used to create and manage individual processes.
- Example:
from multiprocessing import Process
def print_square(num):
print(f"The square of {num} is {num * num}")
if __name__ == "__main__":
p = Process(target=print_square, args=(5,))
p.start()
p.join()
target
: Specifies the function to run in the process.args
: Arguments to pass to the target function.start()
: Starts the process.join()
: Waits for the process to finish.
2. Pool
Class
- Simplifies working with multiple processes by managing a pool of worker processes.
- Example:
from multiprocessing import Pool
def square(num):
return num * num
if __name__ == "__main__":
numbers = [1, 2, 3, 4, 5]
with Pool(processes=4) as pool:
results = pool.map(square, numbers)
print(results) # Output: [1, 4, 9, 16, 25]
map
: Applies a function to a list of inputs, distributing the workload across processes.
3. Queue
and Pipe
- For communication between processes.
- Example using
Queue
:
from multiprocessing import Process, Queue
def producer(queue):
queue.put("Hello from producer!")
def consumer(queue):
msg = queue.get()
print(f"Consumer received: {msg}")
if __name__ == "__main__":
q = Queue()
p1 = Process(target=producer, args=(q,))
p2 = Process(target=consumer, args=(q,))
p1.start()
p2.start()
p1.join()
p2.join()
4. Shared Memory (Value
and Array
)
- For sharing simple data between processes.
- Example:
from multiprocessing import Process, Value
def increment(counter):
with counter.get_lock(): # Lock for thread safety
counter.value += 1
if __name__ == "__main__":
counter = Value('i', 0) # 'i' for integer
processes = [Process(target=increment, args=(counter,)) for _ in range(10)]
for p in processes:
p.start()
for p in processes:
p.join()
print(f"Final counter value: {counter.value}")
5. Lock
- Used to prevent race conditions.
- Example:
from multiprocessing import Process, Lock
def printer(lock, message):
with lock:
print(message)
if __name__ == "__main__":
lock = Lock()
messages = ["Hello", "World", "From", "Multiprocessing"]
processes = [Process(target=printer, args=(lock, msg)) for msg in messages]
for p in processes:
p.start()
for p in processes:
p.join()
Advantages of Multiprocessing
- True Parallelism: Leverages multiple CPU cores to their fullest.
- No GIL Limitation: GIL restrictions are avoided
- Improved Performance: Suitable for CPU-bound operations such as computations or simulations.
Common Pitfalls
- Overhead: It is expensive to create processes, so it is best for larger tasks.
- Debugging: Debugging is very challenging because the processes run in parallel.
- Data Sharing: Shared memory requires careful handling to avoid race conditions.
- Compatibility:
multiprocessing
requires theif_name_ == "_main_":
guard to work correctly on Windows.