What is the reported vulnerability in Google Cloud Dataproc?

A critical vulnerability has been reported in Google Cloud Dataproc, related to the Open Source Software managed solution, presenting a risk of unauthorized access and manipulation for attackers who know the Dataproc cluster IP address.

What is Google Cloud Dataproc?

Google Cloud Dataproc is a cloud-based service from Google Cloud Platform that provides fast, managed processing for big data workloads, including tasks like batch processing, querying, streaming, and machine learning. It operates on a multi-tenant architecture but allows users to create isolated instances for their specific use.

What mitigation strategies are recommended for the Dataproc vulnerability?

To mitigate this threat, it is recommended to use network segmentation, create dedicated VPCs with custom firewall rules, and deploy clusters in separate subnets with controlled public access.

Advisory

Google Cloud Dataproc clusters vulnerable to remote code execution

Q: How was the Dataproc vulnerability discovered and reported?

The vulnerability in Dataproc was discovered by Orca Security and reported to Google's Security Team. It exploits weaknesses in Apache Hadoop’s web interfaces and users' tendency to rely on default settings, potentially leading to unauthenticated access to HDFS and sensitive data.

Q: What are the potential risks and attack vectors associated with this vulnerability?

The vulnerability can lead to unauthorized access to the Apache Hadoop Distributed File System (HDFS), especially in Compute Engine instances vulnerable to Remote Code Execution (RCE) and exposed to the internet. It also poses a risk to Dataproc clusters and Compute Engine instances, particularly those deployed on the default subnet VPC.

published: Dec. 13, 2023

Take action: If you are using Google Cloud Dataproc, review the implementation and public attack surface of your environment. Ideally, your systems should be isolated from public access without authentication and all public components should be well protected and frequently tested.

Learn More

A critical vulnerability in Google Cloud Dataproc , related to the Open Source Software (OSS) managed solution, has been reported, presenting a risk of unauthorized access and manipulation for attackers aware of the Dataproc cluster IP address.

Google Cloud Dataproc is a cloud-based service provided by Google Cloud Platform (GCP) that offers fast, easy-to-use, and fully managed processing for big data workloads. It is designed to handle various big data processing tasks, including batch processing, querying, streaming, and machine learning. Like many cloud services, Dataproc operates on a multi-tenant architecture, but it allows users to create their own isolated instances within this environment. When a user deploys a Dataproc cluster, they are essentially creating a separate instance that is dedicated to their specific use.

Discovered by cybersecurity experts at Orca Security, the flaw was reported to Google's Security Team, who have labeled it as an 'Abuse Risk' but have not yet taken any action. The vulnerability exploits weaknesses in Apache Hadoop’s web interfaces and users' tendency to rely on default settings. The attack vector involves a Compute Engine instance vulnerable to Remote Code Execution (RCE) and exposed to the internet. An attacker can exploit this to scan for open ports and access critical web interfaces like YARN ResourceManager and HDFS NameNode, which are typically not authenticated.

This lack of authentication could allow unauthorized access to the Apache Hadoop Distributed File System (HDFS), endangering sensitive data. Research indicates that many organizations using Dataproc deploy at least one cluster on the default subnet VPC, increasing the vulnerability. The 'default' VPC, which permits inbound connections on internal subnets, could expose Dataproc clusters and Compute Engine instances to security risks.

Google's Dataproc documentation does caution about the potential security risks and advises against open firewall rules on public networks. However, it does not address the scenario where an attacker gains an initial foothold on a Compute Engine instance, leading to unauthenticated access to GCP Dataproc.

To mitigate this threat, researchers suggest network segmentation, using dedicated VPCs with custom firewall rules, and deploying clusters in separate subnets with very controlled public access.

Google Cloud Dataproc clusters vulnerable to remote code execution

Read More
Microsoft reports that massive 10hr Azure outage caused …
ALBeast vulnerability exposes AWS Application Load Balancer Configuration
AWS Cloud Development Kit flaw lets attackers gain …
Tenable reports possible compromise vector in Google Cloud …
Microsoft patches critical authentication bypass flaw in Azure …