Advisory

Critical XXE vulnerability reported in Apache Tika, exploitable via malicious PDFs

Take action: If you use Apache Tika for document processing, plan a quick upgrade of tika-core and tika-parser-pdf-module. Upgrading just the PDF module won't fix this critical vulnerability. Until you can upgrade, restrict PDF file uploads to only verified, trusted sources.


Learn More

Apache has patched a critical security vulnerability in Apache Tika, an open-source toolkit used by thousands of organizations to extract text and metadata from various document formats including PDFs, Word files, and images. 

An XXE (XML External Entity) flaw allows attackers to trick an application into processing malicious XML code that references external files or systems. This enables attackers to read sensitive files from the server, steal data, or make unauthorized requests to internal network resources. For example, An attacker uploads a PDF containing hidden XML code like this:

<?xml version="1.0"?> 
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<data>&xxe;</data>

When the vulnerable application processes this file, instead of just reading the PDF content, it follows the instruction to read the /etc/passwd file from the server and returns it to the attacker, exposing sensitive system information.

The vulnerability is tracked CVE-2025-66516 (CVSS score 10.0) and allows attackers to compromise systems by uploading specially crafted PDF files containing malicious XFA (XML Forms Architecture) code. The malicious code will interfere with an application's processing of XML data, making it possible to trigger malicious requests to internal resources or third-party servers.

The affected packages are:

  • org.apache.tika:tika-core versions 1.13 through 3.2.1 (Patched in version 3.2.2)
  • org.apache.tika:tika-parser-pdf-module versions 2.0.0 through 3.2.1 (Patched in version 3.2.2)
  • org.apache.tika:tika-parsers versions 1.13 before 2.0.0 (Patched in version 2.0.0)

CVE-2025-66516 is assessed to be related to CVE-2025-54988, another XXE flaw in the content detection and analysis framework that was patched by the project maintainers in August 2025. The new CVE expands the scope of affected packages: The entrypoint for the vulnerability was the tika-parser-pdf-module but the actual vulnerability and the fix should be in tika-core. Users who upgraded the tika-parser-pdf-module but did not upgrade tika-core to version 3.2.2 or later are still vulnerable to attack. The original report overlooked that in older Tika 1.x releases, the PDFParser was packaged in the "org.apache.tika:tika-parsers", so legacy systems could be vulnerable even if users believed they had patched the issue.

Users are strongly advised to upgrade Tika-core to version 3.2.2 or later as soon as possible. Organizations using older 1.x versions should contact their software vendor for patched releases and should not wait for automatic updates. As a temporary mitigation measure, organizations should limit the PDF file uploads to only trusted sources.

Critical XXE vulnerability reported in Apache Tika, exploitable via malicious PDFs