All code and resources for this post are available on GitHub. You can follow along by cloning the repository!

For years, Python's Global Interpreter Lock (GIL) has been the bottleneck for CPU-bound tasks such as machine learning, data science tasks, scientific computing, APIs, and ETL workflows. Therefor it has been has long been a hot topic in the Python community. With the latest Python version, it’s finally possible to disable the GIL and unlock multi-core performance. While Python 3.13 itself does not natively support disabling the GIL, it lays the groundwork for better performance. In this blog post, I take a closer look at the GIL and explore the potential of working without it.

What is the GIL?

The Global Interpreter Lock (GIL) is a mechanism in the CPython interpreter that ensures only one thread executes Python bytecode at a time. This protects access to Python objects and ensures thread safety during memory management.

When we execute Python code, the Python interpreter translates the code into actions. Every thread needs access to the interpreter while executing instructions. With the Global Interpreter Lock (GIL), Python ensures that only one thread can access the interpreter at any given time. This means that even in a multi-threaded program, only one thread can actively use the interpreter at a time - this is the role of the GIL.

Existing Parallelism in CPython

Despite the presence of the GIL, parallelism is still achievable in CPython through various ways:

Threading: Python supports multi-threading and has a threading module, though its usefulness is limited by the GIL, as only one thread can execute Python bytecode at a time. However, threading can still be valuable when threads execute C code that is not limted by the GIL.
Multiprocessing: Python has a multiprocessing module, which allows the creation of separate processes, each with its own instance of the Python interpreter. This approach bypasses the GIL, enabling parallel execution of Python Code. However, this comes at the cost of increased overhead and other trade-offs.
Parallelization in C code: Many Python libraries, particularly those implemented in C, can utilize parallelism outside the constraints of the GIL. For example, libraries like NumPy leverage optimized C code to perform parallel computations efficiently.

Creating a GIL-Disabled Python environment

Python has introduced a special build for the no-GIL version called Free-threaded CPython. This build is completely separate from the standard CPython. To use it, it is needed specially to install the free-threaded CPython build.

The installation process is similar to installing any other Python version. During the advanced installation options, simply check the ‘Download free-threaded binaries’ option. This will install the Python 3.13t version, which is the free-threaded build. In this version, the GIL is already disabled, so no additional configuration is required.

Here’s how to run Python programs with and without the GIL:

# standard CPython build
python main.py

# Free threaded CPython build
python3.13t main.py

Using 3.13t with Docker

Another way to use the 3.13t version is by running the Python code in a Docker container. This is the approach I’ll demonstrate here. First, I’ll create a custom Docker image for the free-threaded CPython build, and then run the Python code inside the container.

1. Creating a Docker Image

No-GIL Image
As of now, there is no official Docker container for the free-threaded CPython build (as discussed in issue #947). To address this, I created a custom Docker image by modifying the official python:3.13.1-bookworm image to include the necessary configuration for a GIL-free Python build.
The changes required were adding the --disable-gil flag to the ./configure command and I also installed some packages for the tests (pandas and pyarrow). Below is the complete Dockerfile used for building the GIL-disabled Python image:

FROM buildpack-deps:bookworm

# ensure local python is preferred over distribution python
ENV PATH /usr/local/bin:$PATH

# runtime dependencies
RUN set -eux; \
    apt-get update; \
    apt-get install -y --no-install-recommends \
        libbluetooth-dev \
        tk-dev \
        uuid-dev \
    ; \
    rm -rf /var/lib/apt/lists/*

ENV GPG_KEY 7169605F62C751356D054A26A821E680E5FA6305
ENV PYTHON_VERSION 3.13.1
ENV PYTHON_SHA256 9cf9427bee9e2242e3877dd0f6b641c1853ca461f39d6503ce260a59c80bf0d9

RUN set -eux; \
    \
    wget -O python.tar.xz "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz"; \
    echo "$PYTHON_SHA256 *python.tar.xz" | sha256sum -c -; \
    wget -O python.tar.xz.asc "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz.asc"; \
    GNUPGHOME="$(mktemp -d)"; export GNUPGHOME; \
    gpg --batch --keyserver hkps://keys.openpgp.org --recv-keys "$GPG_KEY"; \
    gpg --batch --verify python.tar.xz.asc python.tar.xz; \
    gpgconf --kill all; \
    rm -rf "$GNUPGHOME" python.tar.xz.asc; \
    mkdir -p /usr/src/python; \
    tar --extract --directory /usr/src/python --strip-components=1 --file python.tar.xz; \
    rm python.tar.xz; \
    \
    cd /usr/src/python; \
    gnuArch="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)"; \
    ./configure \
        --build="$gnuArch" \
        --enable-loadable-sqlite-extensions \
        --enable-optimizations \
        --enable-option-checking=fatal \
        --enable-shared \
        --with-lto \
        --with-ensurepip \
        --disable-gil \
    ; \
    nproc="$(nproc)"; \
    EXTRA_CFLAGS="$(dpkg-buildflags --get CFLAGS)"; \
    LDFLAGS="$(dpkg-buildflags --get LDFLAGS)"; \
    make -j "$nproc" \
        "EXTRA_CFLAGS=${EXTRA_CFLAGS:-}" \
        "LDFLAGS=${LDFLAGS:-}" \
    ; \
# https://github.com/docker-library/python/issues/784
# prevent accidental usage of a system installed libpython of the same version
    rm python; \
    make -j "$nproc" \
        "EXTRA_CFLAGS=${EXTRA_CFLAGS:-}" \
        "LDFLAGS=${LDFLAGS:--Wl},-rpath='\$\$ORIGIN/../lib'" \
        python \
    ; \
    make install; \
    \
# enable GDB to load debugging data: https://github.com/docker-library/python/pull/701
    bin="$(readlink -ve /usr/local/bin/python3)"; \
    dir="$(dirname "$bin")"; \
    mkdir -p "/usr/share/gdb/auto-load/$dir"; \
    cp -vL Tools/gdb/libpython.py "/usr/share/gdb/auto-load/$bin-gdb.py"; \
    \
    cd /; \
    rm -rf /usr/src/python; \
    \
    find /usr/local -depth \
        \( \
            \( -type d -a \( -name test -o -name tests -o -name idle_test \) \) \
            -o \( -type f -a \( -name '*.pyc' -o -name '*.pyo' -o -name 'libpython*.a' \) \) \
        \) -exec rm -rf '{}' + \
    ; \
    \
    ldconfig; \
    \
    export PYTHONDONTWRITEBYTECODE=1; \
    python3 --version; \
    pip3 --version

# make some useful symlinks that are expected to exist ("/usr/local/bin/python" and friends)
RUN set -eux; \
    for src in idle3 pip3 pydoc3 python3 python3-config; do \
        dst="$(echo "$src" | tr -d 3)"; \
        [ -s "/usr/local/bin/$src" ]; \
        [ ! -e "/usr/local/bin/$dst" ]; \
        ln -svT "$src" "/usr/local/bin/$dst"; \
    done

WORKDIR /app

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

CMD ["python3"]

After creating the Dockerfile with the modifications described above, I built the Docker image with the tag python:3.13.1-bookworm-nogil:

docker build --no-cache --file python-3.13.1-bookworm-nogil.Dockerfile --tag python:3.13.1-bookworm-nogil .

GIL-enabled Image
I also create a Docker Image with the GIL enabled. I use the official python:3.13.1-bookworm base image and installed the same packages, like in the nogil image version.

FROM python:3.13.1-bookworm

WORKDIR /app

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

CMD ["python3"]

I built the Docker image with the tag python:3.13.1-bookworm-gil:

docker build --no-cache --file python-3.13.1-bookworm-gil.Dockerfile --tag python:3.13.1-bookworm-gil .

2. Check the GIL Status

The following Python script, gil_status.py, is a simple program to determine the status of the Global Interpreter Lock (GIL) in the current Python environment. The script first prints the Python version that is running and then checks whether the GIL is enabled, disabled, or unsupported in that version:

import sys
import sysconfig


def main():
    # Check version
    print(f"Python version: {sys.version.split()[0]}")

    # Check GIL Status
    status = sysconfig.get_config_var("Py_GIL_DISABLED")
    if status is None:
        print("GIL disabling is not supported in this Python version.")
    elif status == 0:
        print("GIL is activ")
    else:
        print("GIL is disabled")


if __name__ == "__main__":
    main()

Python 3.12

To check the GIL status in Python 3.12, I ran the gil_status.py script inside the official python:3.12 Docker container. The following command was used:

docker container run --rm -it -v ${pwd}/src:/app -w /app python:3.12 python gil_status.py

The result confirms that Python 3.12 does not support disabling the GIL.
Python version: 3.12.8
GIL disabling is not supported in this Python version.

Python 3.13 Standard Build (With GIL)

Next, I tested the gil_status.py script using the previously created python:3.13.1-bookworm-gil Docker container, which features the standard CPython build (with GIL). The following command was executed:

docker container run --rm -it -v ${pwd}/src:/app -w /app python:3.13.1-bookworm-gil python gil_status.py

The result indicates that Python 3.13.1 supports GIL disablement, but in this standard build, the GIL is currently active.
Python version: 3.13.1
GIL is activ

Python 3.13 without GIL

Finally, I tested the gil_status.py script using the previously created Docker image, python:3.13.1-bookworm-nogil, which is built with the free-threaded CPython configuration:

docker container run --rm -it -v ${pwd}/src:/app -w /app python:3.13.1-bookworm-nogil python gil_status.py

The result shows that this version of Python operates without the Global Interpreter Lock (GIL), enabling multi-threading capabilities.
Python version: 3.13.1
GIL is disabled

Performance Comparison With and Without GIL

Prime Number Calculation

To evaluate the impact of the GIL, I tested a classic CPU-intensive program that calculates prime numbers within a specified range (0 to 1 million). The program first runs in single-threaded mode, followed by a multi-threaded version utilizing 4 threads.

import threading
import time


def main():
    UPPER_LIMIT = 10**6  # Maximum number to check
    NUM_THREADS = 4

    # Single threaded
    start_time = time.time()
    result = count_primes(0, UPPER_LIMIT)
    end_time = time.time()
    print(f"Single-threaded execution time: {end_time - start_time:.2f} seconds, primes found: {result}")

    # Multi threaded
    start_time = time.time()
    result = threaded_count_primes(UPPER_LIMIT, NUM_THREADS)
    end_time = time.time()
    print(f"Multi-threaded ({NUM_THREADS} threads) execution time: {end_time - start_time:.2f} seconds, primes found: {result}")


def is_prime(n: int) -> bool:
    if n < 2:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True


def count_primes(start: int, end: int) -> int:
    return sum([is_prime(i) for i in range(start, end)])


def threaded_count_primes(n: int, num_threads: int) -> int:
    threads = []  # List to store thread objects
    results = [0] * num_threads  # Shared list to store results from each thread

    # Worker function to count primes in a given range
    def count_primes_in_range(start: int, end: int, index: int) -> None:
        results[index] = count_primes(start, end)

    # Helper function to calculate ranges for each thread
    def calculate_ranges(n: int, num_threads: int):
        step = n // num_threads
        for i in range(num_threads):
            start = i * step
            # Ensure the last thread includes any leftover range
            end = (i + 1) * step if i != num_threads - 1 else n
            yield start, end, i

    # Create and start threads for each range
    for start, end, index in calculate_ranges(n, num_threads):
        thread = threading.Thread(target=count_primes_in_range, args=(start, end, index))
        threads.append(thread)
        thread.start()

    # Wait for all threads to complete
    for thread in threads:
        thread.join()

    # Sum the results from all threads
    return sum(results)


if __name__ == '__main__':
    main()

Result

Running the program in the python:3.13.1-bookworm (GIL-enabled) and python:3.13.1-bookworm-nogil (GIL-disabled) Docker containers reveals significant differences in performance between single-threaded and multi-threaded executions.

With GIL Enabled
In the GIL-enabled container, the single-threaded and multi-threaded versions have nearly identical execution times. This is expected because the GIL prevents the 4 threads from running in parallel, effectively serializing the execution.

# GIL Enabled
PS C:\Users\simon\dev\python-gil-comparison> docker container run --rm -it -v ${pwd}/src:/app -w /app python:3.13.1-bookworm-gil python prime_counter.py      
Single-threaded execution time: 3.96 seconds, primes found: 78498
Multi-threaded (4 threads) execution time: 4.16 seconds, primes found: 78498

With GIL Disabled
In the GIL-disabled container, the multi-threaded version is nearly twice as fast as the single-threaded version. This improvement is due to the removal of the GIL, allowing the 4 threads to run in true parallel on separate CPU cores.

# GIL Disabled
PS C:\Users\simon\dev\python-gil-comparison> docker container run --rm -it -v ${pwd}/src:/app -w /app python:3.13.1-bookworm-nogil python prime_counter.py
Single-threaded execution time: 4.63 seconds, primes found: 78498
Multi-threaded (4 threads) execution time: 2.43 seconds, primes found: 78498

Real-World Loan Risk Scoring Benchmark: With and Without GIL

The previous Prime Number example gave a great overview of what the GIL disabling can bring for benefit. But a prime number calculation has no real connection to a real world business applicaiton. So let’s try to do in this example something which shows a bit better the performance on example more close to real world.

I found on Kaggle a Dataset with financial data (https://www.kaggle.com/datasets/gauravduttakiit/loan-defaulter). The dataset contains loan data and I created a program which calculates a risk score for each loan.
The program reads the input data in chunks from 50000 rows. The single threaded method makes then the risk score calculaten sequentially and the multi threaded uses 4 threads. The risk score for each row is calculated based on the income-to-loan ratio, the number of dependents, and the applicant's credit history. A simulated formula runs several CPU-intensive operations (like square roots and sine functions) to create a computational load, adjusting the score based on these factors. The result is normalized to a value between 0 and 1, where higher scores indicate higher risk.

import math
import pandas as pd
import threading
import time


def main() -> None:
    INPUT_FILE = "/data/loan_data.parquet"
    NUM_THREADS = 4   

    # Single-threaded execution
    start_time = time.time()
    result_single = risk_scoring_single_threaded(INPUT_FILE)
    end_time = time.time()
    print(f"Single-threaded execution time: {end_time - start_time:.2f} seconds, high-risk loans: {result_single}")

    # Multi-threaded execution
    start_time = time.time()
    result_multi = risk_scoring_multi_threaded(INPUT_FILE, NUM_THREADS)
    end_time = time.time()
    print(f"Multi-threaded execution time: {end_time - start_time:.2f} seconds, high-risk loans: {result_multi}")


def risk_scoring_single_threaded(path:str) -> None: 
    chunks = load_parquet_in_chunks(path, chunk_size=50000)
    high_risk_loans = 0
    for chunk in chunks:
        high_risk_loans += process_chunk(chunk)
    return high_risk_loans


def risk_scoring_multi_threaded(path:str, num_threads:int) -> int:
    chunks = list(load_parquet_in_chunks(path, chunk_size=50000))  # Convert generator to list for repeatable indexing
    high_risk_counts = [0] * len(chunks)  # Ensure results list matches the number of chunks

    def worker(chunk_index):
        # Process the chunk and store the result in the corresponding index
        high_risk_counts[chunk_index] = process_chunk(chunks[chunk_index])

    # Create and start threads
    threads = []
    for i in range(len(chunks)):
        thread = threading.Thread(target=worker, args=(i,))
        threads.append(thread)
        thread.start()

    # Wait for all threads to complete
    for thread in threads:
        thread.join()

    # Return the total sum of high-risk loans
    return sum(high_risk_counts )

def process_chunk(chunk:pd.DataFrame) -> int:
    """Calculate risk scores for a chunk of data."""
    high_risk_loans = 0
    for _, row in chunk.iterrows():
        risk_score = calculate_risk_score(
            applicant_income=row.get("AMT_INCOME_TOTAL", 0),
            loan_amount=row.get("AMT_CREDIT", 1),
            dependents=row.get("CNT_CHILDREN", 0),
            credit_history=row.get("TARGET", 0)
        )

        if risk_score > 0.8:
            high_risk_loans += 1

    return high_risk_loans


def calculate_risk_score(applicant_income:int, loan_amount:int, dependents:int, credit_history:int) -> float:
    """Simulate a risk scoring computation with CPU-intensive operations."""
    try:
        income_to_loan_ratio = applicant_income / loan_amount
    except ZeroDivisionError:
        income_to_loan_ratio = 0

    risk = 0

    # Simulate a CPU-intensive operation for benchmarking purposes only
    for _ in range(100):  # Increase iterations to simulate CPU load
        risk += math.sqrt(income_to_loan_ratio) * math.sin(dependents) ** 2
        risk = (risk % 1) * 100

    risk -= credit_history * 10
    return max(0, min(1, risk / 100))  # Normalize between 0 and 1


def load_parquet_in_chunks(path, chunk_size):
    """Load a Parquet file and yield chunks of data."""
    df = pd.read_parquet(path)
    for i in range(0, len(df), chunk_size):
        yield df.iloc[i:i + chunk_size]


if __name__ == '__main__':
    main()

Results

With GIL Enabled
As expected, the single-threaded and multi-threaded versions have nearly identical execution times, like in the previous test.

PS C:\Users\simon\dev\python-gil-comparison> docker container run --rm -it -v ${pwd}/src:/app -v ${pwd}/data:/data -w /app python:3.13.1-bookworm-gil python loan_risk_scoring_benchmark.py 
Single-threaded execution time: 47.97 seconds, high-risk loans: 19447 
Multi-threaded execution time: 47.74 seconds, high-risk loans: 19447

With GIL Disabled
In the GIL-disabled container, the single-threaded version runs over 20% slower than in the GIL-enabled environment. This suggests that some underlying operations - such as those in pandas or other C-based libraries - may rely on the GIL for optimizations. When the GIL is removed, these operations may experience slower performance.

However, the multi-threaded version runs significantly faster than the single-threaded version. At 25.01 seconds, it is almost 50% faster than the equivalent versions in the GIL-enabled container.

PS C:\Users\simon\dev\python-gil-comparison> docker container run --rm -it -v ${pwd}/src:/app -v ${pwd}/data:/data -w /app -e PYTHON_GIL=0 python:3.13.1-bookworm-nogil python loan_risk_scoring_benchmark.py
Single-threaded execution time: 62.54 seconds, high-risk loans: 19447
Multi-threaded execution time: 25.01 seconds, high-risk loans: 19447

Performance Analysis and Observations

Disabling the Global Interpreter Lock (GIL) resulted in a significant performance improvement in both benchmarks.

In the Prime Number Calculation, the GIL-disabled version was almost twice as fast as the GIL-enabled version, demonstrating a clear advantage for CPU-bound tasks that can leverage multiple threads efficiently.
In the Loan Risk Scoring Benchmark, the GIL-disabled version was almost 50% faster than its GIL-enabled counterpart. While the improvement was substantial, it was not as dramatic as in the prime number test, likely due to underlying operations in Pandas or other C-based libraries that may not be fully optimized for a GIL-free Python.

These results confirm that removing the GIL benefits multi-threaded execution, but the exact impact depends on the workload and underlying libraries.

Looking Ahead

The removal of the Global Interpreter Lock (GIL) has been a long-standing request in the Python community. With the growing demand for AI, data processing, and multi-threaded performance, the topic has gained even more importance. While alternative interpreters like Jython have already removed the GIL, having this option in CPython, the most widely used interpreter, is a significant milestone.

Currently, in Python 3.13 and 3.14, the GIL disablement remains experimental and should not be used in production. Many widely used packages, such as Pandas, Django, and FastAPI, rely on the GIL and are not yet fully tested in a GIL-free environment. In the Loan Risk Scoring Benchmark, Pandas automatically reactivated the GIL, requiring me to explicitly disable it using PYTHON_GIL=0. This is a common issue, and other frameworks may also exhibit stability or performance problems in a No-GIL environment.

In the future, No-GIL Python will become an officially supported option, but it will not be the default for a long time. Based on current developments, I expect No-GIL to become stable around Python 3.16 or 3.17 and potentially the default in Python 3.20 or 3.21 - but this transition will take time. Additionally, removing the GIL introduces new challenges, such as increased complexity in memory management and thread safety, which developers will need to address.

Conclusion

Disabling the GIL in Python unlocks significant performance improvements for multi-threaded, CPU-bound tasks. While the benchmarks showed clear advantages, challenges remain - many libraries still rely on the GIL, and stability issues persist. As Python continues evolving, No-GIL support will mature, but widespread adoption will take time. For now, it’s an exciting glimpse into Python’s multi-threaded future.

Full Code and Resources

All Dockerfiles, data, and benchmark scripts used in this post are available in my GitHub repository. You can check out the full code and experiment with the tests yourself:
→ GitHub Repository: python-gil-comparison

Exploring Python 3.13: Hands-On with the GIL Disablement

Table of contents