A significant malicious campaign known as “EmeraldWhale” has been identified, targeting exposed Git configuration files to acquire over 15,000 cloud account credentials from numerous private repositories. Sysdig, the entity that uncovered this operation, reports that it employs automated tools to scan various IP ranges for these vulnerable Git configuration files, which may contain authentication tokens.
These tokens facilitate the downloading of repositories hosted on platforms such as GitHub, GitLab, and BitBucket, where further credentials are sought. The compromised data was then transferred to Amazon S3 buckets belonging to other victims and was later utilized in phishing and spam initiatives, as well as being sold to other cybercriminals. The exposure of Git authentication tokens not only poses a risk of data theft but may also result in extensive data breaches, similar to the recent incident involving the Internet Archive.
Exposed Git configuration files
Git configuration files, including /.git/config and .gitlab-ci.yml, serve to establish various settings such as repository paths, branches, remotes, and occasionally authentication details like API keys, access tokens, and passwords.
Developers may opt to store these sensitive credentials in private repositories for ease of use, facilitating data transfers and API interactions without the need for repeated authentication setups. This practice poses minimal risk, provided that the repository is adequately secured against public access. However, if the /.git directory containing the configuration file is inadvertently exposed on a website, malicious actors employing scanning tools could readily discover and access these files.
Should these compromised configuration files include authentication tokens, they could be exploited to retrieve associated source code, databases, and other sensitive materials not meant for public visibility. The threat group known as EmeraldWhale utilizes open-source tools such as ‘httpx’ and ‘Masscan’ to scan websites across an estimated 500 million IP addresses, organized into 12,000 distinct IP ranges.
According to Sysdig, the hackers have even compiled files that enumerate every possible IPv4 address, totaling over 4.2 billion entries, to facilitate future scanning efforts. Their scans primarily focus on identifying exposed /.git/config files and environment files (.env) within Laravel applications, which may also harbor API keys and cloud credentials.
Upon detecting an exposure, the tokens are validated through ‘curl’ commands directed at various APIs, and if confirmed as valid, they are employed to download private repositories. These repositories are subsequently re-examined for authentication secrets related to AWS, cloud services, and email providers. The threat actors have leveraged the exposed authentication tokens from email platforms to execute spam and phishing operations. Sysdig has noted the utilization of two common toolsets to enhance the efficiency of this extensive operation, specifically MZR V2 (Mizaru) and Seyzo-v2.
Evaluating the stolen data
Sysdig conducted an analysis of an exposed S3 bucket and discovered approximately one terabyte of sensitive information, which included compromised credentials and logging data. The investigation revealed that EmeraldWhale had pilfered 15,000 cloud credentials from a total of 67,000 URLs that had exposed configuration files. Among these URLs, 28,000 were linked to Git repositories, 6,000 were GitHub tokens, and a significant 2,000 were confirmed as active credentials.
In addition to major platforms such as GitHub, GitLab, and BitBucket, the attackers also targeted 3,500 smaller repositories associated with small teams and individual developers. Sysdig reported that simple lists of URLs leading to exposed Git configuration files are being sold on Telegram for approximately $100; however, those who extract and validate the secrets have access to far more lucrative monetization opportunities.
The researchers emphasized that this campaign is not particularly advanced, relying on readily available tools and automation, yet it successfully compromised thousands of secrets that could result in severe data breaches. To reduce the risk, software developers are advised to utilize specialized secret management tools for storing sensitive information and to employ environment variables for configuring sensitive settings at runtime, rather than embedding them directly in Git configuration files.