Knowledge

GitHub CFOR "feature"- repo data is not really deleted, most of our mistakes are there to stay forever

Take action: GitHub is very clear on things - if you commit something to code, consider it compromised. Any secrets/passwords you have committed to code are not safe just because you deleted them. Any fork of the code can be used to find the commit hash with the password. Similarly, private code in a fork of publicly accessible repo is not secret. Be very mindful of what you open-source. Ideally make a repo from scratch with a manual copy of the code, not a fork.


Learn More

GitHub has a significant exposure of data within its repository network that poses a serious threat to organizations and individuals using the platform.

Known as Cross Fork Object Reference (CFOR), this exposure provides way to access data from deleted, private, or upstream repositories through their forks. This exposure functions similarly to Insecure Direct Object Reference (IDOR) vulnerabilities, where users can access commit data by supplying commit hashes.

GitHub stores repositories and forks in a repository network, with the original “upstream” repository acting as the root node. When an “upstream” repository that has been forked is “deleted”, GitHub reassigns the root node role to one of the downstream forks.

The deleted data is still accessible by directly accessing the commit hash.

Destructive actions in GitHub’s repository network remove references to commit data from the standard GitHub UI and normal git operations. However, this data still exists and is accessible (if you know the commit hash). The commit hash can be discovered via the root node. As long as any root node exists of the code, the hashes can be found. This is the tie-in between CFOR and IDOR vulnerabilities - if you know the commit hash you can directly access data that is not intended for you.

Below we summarize the way CFOR works to expose data that you think is safely deleted:

Accessing Deleted Fork Data, research by Truffle security

  • Scenario: A user forks a public repository, commits code, and then deletes the fork. Despite the deletion, the committed data remains accessible forever.
  • Example: A test on commonly-forked public repositories revealed 40 valid API keys from deleted forks. The typical user behavior involved forking the repo, hard-coding an API key, doing some work, and then deleting the fork.

Accessing Deleted Repository Data

  • Scenario: An upstream public repository is forked by a user. Additional commits are made to the upstream repo and then the repo is deleted. The data from the deleted repo is still accessible via the fork.
  • Example: A critical issue was identified when a major tech company committed a private key for an employee’s GitHub account. Even after deleting the repo, the data was still accessible through a fork.

Accessing Private Repository Data

  • Scenario: An organization creates a private repo, forks it internally, and commits additional code for features not intended for public release. When the upstream repository is made public, the commits made during the private stage become accessible.
  • Example: Organizations open-sourcing new tools while maintaining private forks may inadvertently expose confidential data. Commits made before changing the visibility of the upstream repository are accessible via the public version.

Not a bug, a feature

GitHub’s documentation confirms the design where deleted data from repositories and forks remains accessible, leading to potential misunderstandings among users regarding the security of their data. GitHub acknowledges that their system is designed to work this way and has documented the behavior in their official documentation. However, the persistence of data even after deletion creates a critical security risk.

The only secure method to remediate leaked keys on public GitHub repositories is through key rotation.

GitHub CFOR "feature"- repo data is not really deleted, most of our mistakes are there to stay forever