Advisory

Critical RCE Vulnerability in SGLang AI Framework via Malicious GGUF Models

Take action: If you run SGLang for serving LLMs, treat it as unsafe right now: restrict the API to trusted internal networks only, run it in a non-privileged container, and do not load any GGUF models from public repositories like Hugging Face until the maintainers release a patch. As a temporary fix, have your team manually patch the source to use Jinja2's ImmutableSandboxedEnvironment instead of the default environment.


Learn More

SGLang, a popular open-source framework for serving large language models (LLMs), faces a critical remote code execution (RCE) vulnerability. The issue is caused byhow the framework handles model metadata, specifically within its reranking functionality. Because the project has over 26,000 stars on GitHub, the potential impact on the AI development community is significant.

The flaw is tracked as CVE-2026-5760 (CVSS score 9.8) - A command injection vulnerability in the SGLang reranking endpoint that allows attackers to run arbitrary Python code. It occurs because the framework uses an unsandboxed Jinja2 environment to render chat templates provided within model metadata. By loading a specially crafted GGUF file, an attacker can escape the template context to run system-level commands when the /v1/rerank endpoint is accessed.

Successful exploitation grants an attacker the same privileges as the SGLang service, leading to full host compromise. This allows for lateral movement within corporate networks, theft of sensitive training data or API keys, and the deployment of persistent backdoors. Since many users download models from public repositories like Hugging Face, the risk of supply chain attacks is high. The vulnerability is particularly dangerous for deployments that expose the reranking API to untrusted networks.

The attack chain mirrors the 'Llama Drama' vulnerability, where model metadata serves as an exploit vector. Attackers use the Jinja2 server-side template injection (SSTI) technique to access the Python 'os' module and run shell commands:

1. Payload Crafting & Model Creation The attacker generates a malicious Large Language Model file using the GPT-Generated Unified Format (GGUF). Inside this file, they modify the metadata field tokenizer.chat_template. They inject a Jinja2 Server-Side Template Injection (SSTI) payload alongside a specific trigger phrase: "The answer can only be 'yes' or 'no'."

2. Distribution and Deployment The attacker uploads the compromised GGUF model to a public repository (such as Hugging Face). A victim, unaware of the payload, downloads this model and deploys it using the SGLang serving framework.

3. Triggering the Vulnerable Path Once the server is running, the attacker (or any user) sends a standard request to SGLang’s reranking endpoint (/v1/rerank).

4. Unsandboxed Rendering SGLang processes the request. The trigger phrase embedded in the template forces the framework to route the request through the Qwen3 reranker detection path. SGLang then reads the malicious chat_template and attempts to render it using the vulnerable _render_jinja_chat_template() function.

The Root Cause: The framework uses a standard jinja2.Environment() instead of an ImmutableSandboxedEnvironment. This means Jinja2 evaluates the template without any security restrictions preventing OS-level access.

5. Remote Code Execution (RCE) Because the environment is unsandboxed, the Jinja2 payload executes. A payload such as:

{{ lipsum.__globals__["os"].popen("id").read() }}

escapes the template context, interacts directly with the underlying Python environment, and executes arbitrary shell commands (like id) on the host server, resulting in a full system compromise.

As of the latest reports, the SGLang maintainers have not released an official patch or response. Security researchers recommend that users manually modify the source code to use ImmutableSandboxedEnvironment from the Jinja2 library to prevent code execution. Organizations should also vet all third-party models before loading them into production environments. Restricting network access to the SGLang API and using non-privileged containers can further limit the potential damage from an exploit.

Critical RCE Vulnerability in SGLang AI Framework via Malicious GGUF Models