Skip to content

Monitoring Gists for secrets with TruffleHog

Published: at 08:01 AM

Introduction

In this article, we will guide you through setting up a basic monitoring system to detect leaked secrets in GitHub Gists using TruffleHog. This process will help you to identify sensitive information, such as API keys and passwords, that may have been unintentionally exposed in public Gists.

Table of contents

Open Table of contents

Gists

GitHub Gists are a convenient tool for sharing code snippets, notes, and other text files. They function as lightweight repositories, allowing developers to easily share individual files or small collections of files with version control. Gists can be public, accessible to anyone, or secret, shared only with specific people via a unique link. With support for multiple file types, code syntax highlighting, and Markdown, Gists are perfect for sharing scripts, configuration files, or even quick tutorials. They also integrate seamlessly with GitHub, allowing users to fork, clone, and track changes just like with full repositories.

Github Gists API

The GitHub Gists API provides developers with programmatic access to GitHub’s Gists feature, enabling the creation, retrieval, updating, and deletion of gists directly through HTTP requests. With the API, you can automate fetching public gists from shell script for example. The Gists API Documentation can be found here. To use the API you will need an access token, follow the instructions here to generate one.

Below is an example request to fetch latest public Gists:

curl -L \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer github_pat_XXXXXXXXXXXXXX" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  -s https://api.github.com/gists/public

You can fetch a maximum of 3000 paginated results - 100 per page.

TruffleHog

TruffleHog, developed by Truffle Security, is a powerful security tool designed to detect and prevent the leakage of sensitive information in code repositories. By scanning the entire commit history of Git repositories, TruffleHog identifies secrets such as API keys, passwords, and other confidential data using entropy analysis and regex matching. It integrates seamlessly into CI/CD pipelines, helping developers catch and eliminate potential security risks before they are pushed to production. Ideal for security audits and incident response, TruffleHog is an essential tool for safeguarding codebases against inadvertent exposure of sensitive credentials.

Putting it all together

Now, let’s bring everything together by creating a simple shell script that will query the Gists API for public Gists, fetch each repository’s URL, and scan it with TruffleHog.

We’ll start by defining a few variables for temporary files, logs, and the access token:

#!/bin/bash

GISTS_DOWNLOADED=downloaded-gists.tmp
GISTS_URLS=gist-urls.tmp
PREV_GISTS_URLS=downloaded-gist-urls.log
TRUFFLE_LOG="trufflehog.log"
NUM=100
TIMESTAMP=""
TOKEN="github_pat_XXXXXXXXXXXXXX"

Next, we’ll continuously query the Gists API in an infinite loop, extracting the Git pull URLs for each Gist. These URLs will be saved in a sorted text file, ensuring there are no duplicates:

while true; do
  url="https://api.github.com/gists/public?per_page=$NUM&since=$TIMESTAMP"
  echo "URL: $url"

  curl -L \
    -H "Accept: application/vnd.github+json" \
    -H "Authorization: Bearer $TOKEN" \
    -H "X-GitHub-Api-Version: 2022-11-28" \
    -s \
    -o $GISTS_DOWNLOADED $url

  TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
  cat $GISTS_DOWNLOADED | jq -r '.[].git_pull_url' | sort -u > $GISTS_URLS

  echo "Downloaded $(wc -l $GISTS_URLS)"

  cat $GISTS_URLS >> $PREV_GISTS_URLS
  sort -o $PREV_GISTS_URLS -u $PREV_GISTS_URLS

  rm $GISTS_URLS
  rm $GISTS_DOWNLOADED

  echo "Done, sleeping..."
  sleep 600
done

The final step of the script is to run TruffleHog on each repository:

for gist in $(cat $GISTS_URLS); do
  if [[ $(grep -c "$gist" $PREV_GISTS_URLS) -eq 0 ]]; then
    echo "[+] gist: ${gist}"
    trufflehog git --no-update --concurrency=3 -j $gist --only-verified | tee -a $TRUFFLE_LOG
  fi
done

We’re focusing solely on verified credentials, with all findings displayed on the screen and saved to a text file.

The complete script is provided below for your reference, including all the necessary steps and configurations:

#!/bin/bash

GISTS_DOWNLOADED=downloaded-gists.tmp
GISTS_URLS=gist-urls.tmp
PREV_GISTS_URLS=downloaded-gist-urls.log
TRUFFLE_LOG="trufflehog.log"
NUM=100
TIMESTAMP=""
TOKEN="github_pat_XXXXXXXXXXXXXX"

while true; do
  url="https://api.github.com/gists/public?per_page=$NUM&since=$TIMESTAMP"
  echo "URL: $url"

  curl -L \
    -H "Accept: application/vnd.github+json" \
    -H "Authorization: Bearer $TOKEN" \
    -H "X-GitHub-Api-Version: 2022-11-28" \
    -s \
    -o $GISTS_DOWNLOADED $url

  TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
  cat $GISTS_DOWNLOADED | jq -r '.[].git_pull_url' | sort -u > $GISTS_URLS

  echo "Downloaded $(wc -l $GISTS_URLS)"
  for gist in $(cat $GISTS_URLS); do
    if [[ $(grep -c "$gist" $PREV_GISTS_URLS) -eq 0 ]]; then
      echo "[+] gist: ${gist}"
      trufflehog git --no-update --concurrency=3 -j $gist --only-verified | tee -a $TRUFFLE_LOG
    fi
  done

  cat $GISTS_URLS >> $PREV_GISTS_URLS
  sort -o $PREV_GISTS_URLS -u $PREV_GISTS_URLS

  rm $GISTS_URLS
  rm $GISTS_DOWNLOADED

  echo "Done, sleeping..."
  sleep 600
done

Displaying results

TruffleHog saves its results in JSON format. To format the output and extract the most important information, you can use the jq command. An example of how to do this is provided below:

cat trufflehog.log | jq 'select(.Verified == true) | {DetectorName: .DetectorName, verified: .Verified, Raw: .Raw, RawV2: .RawV2, email: .SourceMetadata.Data.Git.email, repo: .SourceMetadata.Data.Git.repository}'

Conclusion

By leveraging TruffleHog, you can automate the detection of sensitive information, such as API keys and passwords, that may have been inadvertently exposed in public Gists.

References