Advisory

Google Cloud Dataproc clusters vulnerable to remote code execution

Take action: If you are using Google Cloud Dataproc, review the implementation and public attack surface of your environment. Ideally, your systems should be isolated from public access without authentication and all public components should be well protected and frequently tested.


Learn More

A critical vulnerability in Google Cloud Dataproc , related to the Open Source Software (OSS) managed solution, has been reported, presenting a risk of unauthorized access and manipulation for attackers aware of the Dataproc cluster IP address.

Google Cloud Dataproc is a cloud-based service provided by Google Cloud Platform (GCP) that offers fast, easy-to-use, and fully managed processing for big data workloads. It is designed to handle various big data processing tasks, including batch processing, querying, streaming, and machine learning. Like many cloud services, Dataproc operates on a multi-tenant architecture, but it allows users to create their own isolated instances within this environment. When a user deploys a Dataproc cluster, they are essentially creating a separate instance that is dedicated to their specific use.

Discovered by cybersecurity experts at Orca Security, the flaw was reported to Google's Security Team, who have labeled it as an 'Abuse Risk' but have not yet taken any action. The vulnerability exploits weaknesses in Apache Hadoop’s web interfaces and users' tendency to rely on default settings. The attack vector involves a Compute Engine instance vulnerable to Remote Code Execution (RCE) and exposed to the internet. An attacker can exploit this to scan for open ports and access critical web interfaces like YARN ResourceManager and HDFS NameNode, which are typically not authenticated.

This lack of authentication could allow unauthorized access to the Apache Hadoop Distributed File System (HDFS), endangering sensitive data. Research indicates that many organizations using Dataproc deploy at least one cluster on the default subnet VPC, increasing the vulnerability. The 'default' VPC, which permits inbound connections on internal subnets, could expose Dataproc clusters and Compute Engine instances to security risks.

Google's Dataproc documentation does caution about the potential security risks and advises against open firewall rules on public networks. However, it does not address the scenario where an attacker gains an initial foothold on a Compute Engine instance, leading to unauthenticated access to GCP Dataproc.

To mitigate this threat, researchers suggest network segmentation, using dedicated VPCs with custom firewall rules, and deploying clusters in separate subnets with very controlled public access.

Google Cloud Dataproc clusters vulnerable to remote code execution