Introduction
In this article, we will guide you through setting up a basic monitoring system to detect leaked secrets in GitHub Gists using TruffleHog. This process will help you to identify sensitive information, such as API keys and passwords, that may have been unintentionally exposed in public Gists.
Table of contents
Open Table of contents
Gists
GitHub Gists are a convenient tool for sharing code snippets, notes, and other text files. They function as lightweight repositories, allowing developers to easily share individual files or small collections of files with version control. Gists can be public, accessible to anyone, or secret, shared only with specific people via a unique link. With support for multiple file types, code syntax highlighting, and Markdown, Gists are perfect for sharing scripts, configuration files, or even quick tutorials. They also integrate seamlessly with GitHub, allowing users to fork, clone, and track changes just like with full repositories.
Github Gists API
The GitHub Gists API provides developers with programmatic access to GitHub’s Gists feature, enabling the creation, retrieval, updating, and deletion of gists directly through HTTP requests. With the API, you can automate fetching public gists from shell script for example. The Gists API Documentation can be found here. To use the API you will need an access token, follow the instructions here to generate one.
Below is an example request to fetch latest public Gists:
curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer github_pat_XXXXXXXXXXXXXX" \
-H "X-GitHub-Api-Version: 2022-11-28" \
-s https://api.github.com/gists/public
You can fetch a maximum of 3000
paginated results - 100
per page.
TruffleHog
TruffleHog, developed by Truffle Security, is a powerful security tool designed to detect and prevent the leakage of sensitive information in code repositories. By scanning the entire commit history of Git repositories, TruffleHog identifies secrets such as API keys, passwords, and other confidential data using entropy analysis and regex matching. It integrates seamlessly into CI/CD pipelines, helping developers catch and eliminate potential security risks before they are pushed to production. Ideal for security audits and incident response, TruffleHog is an essential tool for safeguarding codebases against inadvertent exposure of sensitive credentials.
Putting it all together
Now, let’s bring everything together by creating a simple shell script that will query the Gists API for public Gists, fetch each repository’s URL, and scan it with TruffleHog.
We’ll start by defining a few variables for temporary files, logs, and the access token:
#!/bin/bash
GISTS_DOWNLOADED=downloaded-gists.tmp
GISTS_URLS=gist-urls.tmp
PREV_GISTS_URLS=downloaded-gist-urls.log
TRUFFLE_LOG="trufflehog.log"
NUM=100
TIMESTAMP=""
TOKEN="github_pat_XXXXXXXXXXXXXX"
Next, we’ll continuously query the Gists API in an infinite loop, extracting the Git pull URLs for each Gist. These URLs will be saved in a sorted text file, ensuring there are no duplicates:
while true; do
url="https://api.github.com/gists/public?per_page=$NUM&since=$TIMESTAMP"
echo "URL: $url"
curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer $TOKEN" \
-H "X-GitHub-Api-Version: 2022-11-28" \
-s \
-o $GISTS_DOWNLOADED $url
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
cat $GISTS_DOWNLOADED | jq -r '.[].git_pull_url' | sort -u > $GISTS_URLS
echo "Downloaded $(wc -l $GISTS_URLS)"
cat $GISTS_URLS >> $PREV_GISTS_URLS
sort -o $PREV_GISTS_URLS -u $PREV_GISTS_URLS
rm $GISTS_URLS
rm $GISTS_DOWNLOADED
echo "Done, sleeping..."
sleep 600
done
The final step of the script is to run TruffleHog on each repository:
for gist in $(cat $GISTS_URLS); do
if [[ $(grep -c "$gist" $PREV_GISTS_URLS) -eq 0 ]]; then
echo "[+] gist: ${gist}"
trufflehog git --no-update --concurrency=3 -j $gist --only-verified | tee -a $TRUFFLE_LOG
fi
done
We’re focusing solely on verified credentials, with all findings displayed on the screen and saved to a text file.
The complete script is provided below for your reference, including all the necessary steps and configurations:
#!/bin/bash
GISTS_DOWNLOADED=downloaded-gists.tmp
GISTS_URLS=gist-urls.tmp
PREV_GISTS_URLS=downloaded-gist-urls.log
TRUFFLE_LOG="trufflehog.log"
NUM=100
TIMESTAMP=""
TOKEN="github_pat_XXXXXXXXXXXXXX"
while true; do
url="https://api.github.com/gists/public?per_page=$NUM&since=$TIMESTAMP"
echo "URL: $url"
curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer $TOKEN" \
-H "X-GitHub-Api-Version: 2022-11-28" \
-s \
-o $GISTS_DOWNLOADED $url
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
cat $GISTS_DOWNLOADED | jq -r '.[].git_pull_url' | sort -u > $GISTS_URLS
echo "Downloaded $(wc -l $GISTS_URLS)"
for gist in $(cat $GISTS_URLS); do
if [[ $(grep -c "$gist" $PREV_GISTS_URLS) -eq 0 ]]; then
echo "[+] gist: ${gist}"
trufflehog git --no-update --concurrency=3 -j $gist --only-verified | tee -a $TRUFFLE_LOG
fi
done
cat $GISTS_URLS >> $PREV_GISTS_URLS
sort -o $PREV_GISTS_URLS -u $PREV_GISTS_URLS
rm $GISTS_URLS
rm $GISTS_DOWNLOADED
echo "Done, sleeping..."
sleep 600
done
Displaying results
TruffleHog saves its results in JSON format. To format the output and extract the most important information, you can use the jq
command. An example of how to do this is provided below:
cat trufflehog.log | jq 'select(.Verified == true) | {DetectorName: .DetectorName, verified: .Verified, Raw: .Raw, RawV2: .RawV2, email: .SourceMetadata.Data.Git.email, repo: .SourceMetadata.Data.Git.repository}'
Conclusion
By leveraging TruffleHog, you can automate the detection of sensitive information, such as API keys and passwords, that may have been inadvertently exposed in public Gists.