Advisory

Vulnerability chain in NVIDIA Triton Inference Server enables complete AI server takeover

Take action: If you're using NVIDIA Triton Inference Server for AI model deployment, plan a quick upgrade to version 25.07 or newer. In the meantime, make sure it's isolated from the internet and accessible only from trusted networks.


Learn More

NVIDIA has patched a series of security vulnerabilities in its Triton Inference Server, a widely-used open-source platform for deploying artificial intelligence models. Some of the vulnerabilities can be chained together and enable remote, unauthenticated attackers to achieve complete control over vulnerable servers, potentially compromising valuable AI models and sensitive data processed by these systems.

NVIDIA Triton Inference Server is used by over 25,000 companies globally, including major enterprises such as Microsoft, Amazon, Oracle, Siemens, and American Express for optimizing AI model deployment. 

Vulnerability Chain Attack 

The vulnerability chain was discovered by security researchers at Wiz, who identified how seemingly minor flaws can be combined to create a complete exploit. Chained vulnerabilities are:

  • CVE-2025-23319 (CVSS score 8.1) - Another Python backend out-of-bounds write vulnerability exploitable through malicious requests, central to the vulnerability chain discovered by Wiz Research.
  • CVE-2025-23320 (CVSS score 7.5) - A Python backend vulnerability where attackers can exceed shared memory limits through oversized requests, leading to information disclosure.
  • CVE-2025-23334 (CVSS score 5.9) - Another Python backend out-of-bounds read vulnerability, part of the Wiz Research vulnerability chain.

Step 1: Information Disclosure - Attackers send crafted, oversized requests to the Python backend, triggering an exception that causes error messages to leak the full name of the backend's internal IPC shared memory region (e.g., "triton_python_backend_shm_region_4f50c226-b3d0-46e8-ac59-d4690b28b859").

Step 2: Shared Memory API Abuse - Using the leaked memory name, attackers exploit Triton's legitimate shared memory API, which lacks validation to distinguish between user-owned and internal memory regions, allowing them to register the internal shared memory key and gain read/write access to the Python backend's private memory space.

Step 3: Remote Code Execution - With memory access established, attackers can corrupt data structures, manipulate pointers for out-of-bounds memory access, and craft malicious IPC messages to achieve full system compromise, potentially leading to AI model theft, data breaches, response manipulation, and network lateral movement.

NVidia has patched these along with a major update

  • CVE-2025-23310 (CVSS score 9.8) - A stack buffer overflow vulnerability affecting NVIDIA Triton Inference Server for Windows and Linux, where attackers can cause overflow through specially crafted inputs, potentially leading to remote code execution, denial of service, information disclosure, and data tampering.
  • CVE-2025-23311 (CVSS score 9.8) - Another stack overflow vulnerability where attackers can exploit the system through specially crafted HTTP requests, with similar impacts including remote code execution, denial of service, information disclosure, and data tampering.
  • CVE-2025-23317 (CVSS score 9.1) - A vulnerability in the HTTP server component allowing attackers to start a reverse shell by sending specially crafted HTTP requests, potentially resulting in remote code execution, denial of service, data tampering, and information disclosure.
  • CVE-2025-23318 (CVSS score 8.1) - A Python backend vulnerability causing out-of-bounds write conditions that can lead to code execution, denial of service, data tampering, and information disclosure.
  • CVE-2025-23319 (CVSS score 8.1) - Another Python backend out-of-bounds write vulnerability exploitable through malicious requests, central to the vulnerability chain discovered by Wiz Research.
  • CVE-2025-23320 (CVSS score 7.5) - A Python backend vulnerability where attackers can exceed shared memory limits through oversized requests, leading to information disclosure.
  • CVE-2025-23321 (CVSS score 7.5) - A divide-by-zero vulnerability triggered by invalid requests, causing denial of service.
  • CVE-2025-23322 (CVSS score 7.5) - A double-free vulnerability occurring when streams are cancelled before processing, resulting in denial of service.
  • CVE-2025-23323 (CVSS score 7.5) - An integer overflow vulnerability causing segmentation faults through invalid requests.
  • CVE-2025-23324 (CVSS score 7.5) - Another integer overflow vulnerability with similar impacts to CVE-2025-23323.
  • CVE-2025-23325 (CVSS score 7.5) - An uncontrolled recursion vulnerability exploitable through specially crafted inputs.
  • CVE-2025-23326 (CVSS score 7.5) - An integer overflow vulnerability causing denial of service through crafted inputs.
  • CVE-2025-23327 (CVSS score 7.5) - An integer overflow vulnerability that can lead to both denial of service and data tampering.
  • CVE-2025-23331 (CVSS score 7.5) - A memory allocation vulnerability causing segmentation faults through invalid requests.
  • CVE-2025-23333 (CVSS score 5.9) - A Python backend out-of-bounds read vulnerability through shared memory manipulation.
  • CVE-2025-23334 (CVSS score 5.9) - Another Python backend out-of-bounds read vulnerability, part of the Wiz Research vulnerability chain.
  • CVE-2025-23335 (CVSS score 4.4) - A TensorRT backend underflow vulnerability affecting specific model configurations.

Affected Systems and Versions

  • CVE-2025-23323, CVE-2025-23324, CVE-2025-23325, CVE-2025-23326, CVE-2025-23327, and CVE-2025-23335 affect all versions prior to 25.05.
  • CVE-2025-23322 and CVE-2025-23331 impact all versions prior to 25.06.
  • CVE-2025-23310, CVE-2025-23311, CVE-2025-23317, CVE-2025-23318, CVE-2025-23319, CVE-2025-23320, CVE-2025-23321, CVE-2025-23333, and CVE-2025-23334, affect all versions prior to 25.07.

Users should upgrade to NVIDIA Triton Inference Server version 25.07 or newer. Organizations should also follow the Secure Deployment Considerations Guide and ensure that logging and shared memory APIs are protected for use by authorized users only. 

Vulnerability chain in NVIDIA Triton Inference Server enables complete AI server takeover