Introduction
In the first two articles of this series, we covered both introductory and advanced Git features. Now, let’s dive into Git’s collaboration and automation tools. These features are particularly useful in team environments or when managing large repositories. We’ll explore Git Hooks, Submodules, Reflog, and other advanced workflows.
TL;DR
You can find a shorter cheat sheet version of this article here.
Table of contents
Open Table of contents
Git Hooks
Git hooks allow you to automate tasks when certain events happen in a repository. Some examples: before or after a commit, push, or merge. Hooks are simple shell scripts, located in the .git/hooks
directory of your project. You can use hooks to enforce coding standards, run tests, or trigger external services.
Example: Pre-commit Hook
Imagine you want to check for files larger than 1MB and prevent them from going into the commit. This is useful in projects where large files can bloat the repository:
#!/bin/sh
# .git/hooks/pre-commit
# Prevent committing files larger than 1MB
max_size=1000000 # 1MB in bytes
echo "Checking for large files..."
for file in $(git diff --cached --name-only); do
if [ -f "$file" ]; then
file_size=$(wc -c <"$file")
if [ "$file_size" -ge "$max_size" ]; then
echo "$file is too large ($file_size bytes). Maximum allowed size is 1MB."
exit 1
fi
fi
done
Make sure to make the hook executable:
chmod +x .git/hooks/pre-commit
This script will automatically check file sizes every time you commit.
Example: Commit message
You can also run a commit-msg hook to ensure commit message format (for example length):
#!/bin/sh
# .git/hooks/commit-msg
# Enforce a commit message length of 10 characters or more
min_length=10
commit_message=$(cat "$1")
if [ ${#commit_message} -lt $min_length ]; then
echo "Error: Commit message is too short."
exit 1
fi
Submodules
Git submodules allow you to include external repositories inside your project. This is useful for managing dependencies or shared libraries that should be developed separately but still included in your main project.
Example: Adding a Submodule
You can add a submodule to your project like this:
git submodule add https://github.com/example/library.git path/to/submodule
This command creates a folder at path/to/submodule
where the external repository will be checked out. The state of the submodule is tracked, but changes to the submodule’s repository must be committed separately.
To update a submodule, run:
git submodule update --remote
Reflog
Sometimes you might accidentally reset a branch or lose a commit, and Git’s reflog can help recover those lost changes. The reflog tracks every change made to the HEAD of the repository, making it easier to find old commits.
Example: Recovering a Lost Commit
If you mistakenly reset a branch and lost commits, use reflog to find the commit:
git reflog
This will show you the history of all changes:
d5f5e7f HEAD@{0}: reset: moving to HEAD^
a72c0e4 HEAD@{1}: commit: Added new feature
To recover the lost commit:
git reset --hard a72c0e4
Now your branch is restored to the commit that was accidentally reset.
Bisect
The git bisect
command is a powerful debugging tool that helps you quickly find the commit that introduced a bug or issue in your code. It automates the process of binary search within your Git history, allowing you to efficiently narrow down the exact commit that caused the problem.
How git bisect
Works
-
Binary Search:
git bisect
works by dividing your commit history into two parts: a “good” part where the code works as expected and a “bad” part where the bug is present. It then repeatedly checks the middle commit, asking you if it’s good or bad, cutting the search space in half with each step. This is much faster than manually checking commits one by one. -
Iterative Testing: You will mark commits as either “good” (bug-free) or “bad” (buggy) as
git bisect
guides you through the process. Git continues narrowing the range of suspect commits until it pinpoints the exact commit that introduced the bug.
Basic Workflow
-
Start Bisect:
- First, tell Git that you want to begin the bisect process:
git bisect start
-
Mark a Bad Commit:
- Identify the commit where the bug is present, usually the most recent one:
git bisect bad
-
Mark a Good Commit:
- Identify a commit from the past where the bug was not present:
git bisect good <commit-id>
-
Iterative Search:
- Git will now check out a commit in the middle of your good and bad range.
- You test this version of your code. If the bug is present, mark it as bad:
git bisect bad
- If the bug is not present, mark it as good:
git bisect good
-
Repeat:
- Git will continue narrowing down the range of commits by checking the midpoint, and you’ll keep marking commits as either good or bad.
-
Identify the Culprit:
- Once the offending commit is found, Git will output the commit details. You can inspect this commit to understand what change introduced the bug.
-
End Bisect:
- After you’ve found the commit, terminate the bisect session:
git bisect reset
Example: Using git bisect
Suppose you know that the most recent commit (HEAD
) contains a bug, but you are unsure when it was introduced. You also know that the code was working fine a few commits ago. Here’s how you can use git bisect
:
-
Start the bisect session:
git bisect start
-
Mark the most recent commit as bad:
git bisect bad
-
Mark an older commit as good (one where the bug wasn’t present):
git bisect good abc1234
-
Git will check out a middle commit, and you test it. Let’s say the bug is still present, so you mark it as bad:
git bisect bad
-
Git checks out another commit halfway through, and you find the bug is not present, so you mark it as good:
git bisect good
-
This process continues until Git identifies the specific commit that introduced the bug.
Automating Bisect
You can automate the testing part of git bisect
if your project has a script that can determine whether the bug is present. For example:
git bisect run ./test_script.sh
Git will automatically run your script on each checked-out commit and mark it as good or bad based on the exit code of the script (0 for good, non-zero for bad).
When to Use git bisect
- Finding the Cause of Bugs: If your codebase has become buggy and you can’t identify the cause,
git bisect
is one of the fastest ways to find the offending commit. - Debugging Performance Regressions: You can use
git bisect
to find the commit where a performance regression was introduced by testing performance at each step.
Advantages of git bisect
- Efficient: Instead of manually checking each commit,
git bisect
reduces the number of commits you need to check to log(n) (logarithmic time complexity). - Accurate: It can pinpoint the exact commit that introduced a bug, making it easier to fix.
Worktrees
Worktrees allow you to have multiple working directories in the same Git repository. This is particularly useful when you need to work on multiple branches simultaneously without switching between them.
Example: Creating a Worktree
Let’s say you’re working on a feature branch but need to quickly check something on the main
branch:
git worktree add ../main-worktree main
This creates a new directory, ../main-worktree
, where the main
branch is checked out. You can now work on both branches simultaneously.
To remove a worktree:
git worktree remove ../main-worktree
Sparse Checkout
Sparse checkout is a Git feature that allows you to check out only specific parts of a large repository. This is useful when you only need a subset of files from a repository.
Example: Using Sparse Checkout
Let’s say you have a large monorepo hosted at https://github.com/example/monorepo.git
, and you only want to work with the frontend/ and backend/ directories.
First, clone the repository:
git clone --no-checkout https://github.com/example/monorepo.git
cd monorepo
Next, specify the directories or files you want to check out:
git sparse-checkout set src/ include/ docs/
Then, use checkout
command, to fetch specified directories:
git checkout
This will check out only the src
, include
, and docs
directories.
Git Flow and Branching Strategies
Git branching strategies help teams manage the development process more efficiently. Two common strategies are Git Flow and GitHub Flow.
Git Flow
In Git Flow, you have several branches:
- Main: For stable, production-ready code.
- Develop: For ongoing development.
- Feature: For individual features.
- Release: For preparing a release.
- Hotfix: For urgent fixes to production code.
Example: Starting a Feature Branch
git checkout -b feature/new-feature develop
After finishing the feature, you merge it back into the develop
branch:
git checkout develop
git merge feature/new-feature
git push origin develop
GitHub Flow
GitHub Flow is a simpler alternative to Git Flow, involving just two branches:
- Main: For production-ready code.
- Feature branches: Created for each new feature, merged directly into
main
after review.
Conclusion
In this third part of our Git series, we explored essential features for collaboration and automation. From Git hooks to submodules, reflog, and branching strategies, these advanced tools help streamline development in both individual and team environments.
Stay tuned for more on how to further optimize your Git workflows in upcoming articles.