Jiby's toolbox

Jb Doyon’s personal website

My git worfklow

Posted on — Jun 30, 2023

Every now and then, at work, I find myself discussing git worfklows, commit messages, branching, releasing, versioning, changelogs etc. Since my opinion has remained fairly consistent for the past few years, I found myself repeating the same points a lot, so I wrote it down. This page is the resulting compilation of my opinions on the software development lifecycle (SDLC), without workplace-specific tangeants.

The article is broken down into ideas + recommendations, with recommendations in bold. Obviously, this workflow isn’t applicable to everyone (sorry, Google! This isn’t a free consultation!) as different teams have needs that should get dealt with differently.

The intent, beyond expressing my views with the internet, is to share “the big picture” of my personal framework for software development, beyond just “git tags are good” or other overly-specific advice.

Scope: The resulting document describes, I hope, a workflow that keeps overhead fairly low, works at small team scale (1 to 20 active devs per git repository), while keeping reasonable traceability. That’s where I find myself developing professionally most of my time.

Summary: Philosophy can be described as (Scaled) Trunk-based development with heavy (personal) commit message discipline, tag-based versioning, and overall adherence to 12 Factors.

Git commit guidelines

Follow Chris Beams’ git commit message guidelines with self-discipline

A great reference: https://chris.beams.io/posts/git-commit. The post explains the classic “Seven Rules”:

  1. Separate subject from body with a blank line

  2. Limit the subject line to 50 characters

  3. Capitalize the subject line

  4. Do not end the subject line with a period

  5. Use the imperative mood in the subject line

  6. Wrap the body at 72 characters

  7. Use the body to explain what and why vs. how

Recommend formalizing the adoption of that guide as reference: though these are already the best practices of the git community, having a reference to point to helps everyone, e.g for onboarding.

Recommend against using git commit -m: learn to use your editor instead. Similarly, avoid git add --all (aka git add -A). Instead, learn git add -p for interactive hunk-by-hunk staging. I found that learning to tune contents and message of commits early on leads to better commits in the long term, and the flags described above are at best crutches that prevent such learning.

Branching strategy

Use trunk-based development, squash-rebasing PRs, staying mindful of the flow from commit messages → PR description → commit in master

Consider using “main” instead of “master”, not as much for ideological reasons, but because it’s shorter to type and is clearer for non-latin speakers. Once the dust settled on the main vs master debate, I found “main” aligns common parlance around “the main branch”, as we don’t do much “mastering”.

Use the main branch for releases (see versioning section below), no “dev” branch (split off main branch for features instead), no “releases” branch (we don’t need to backport fix to old releases, just fix it for the next one).

Use squash-rebase approach usually, to keep the git log tidy (non-fastforward merges can cause clutter in git graph horizontally, and squashing avoids commits saying “update per review” = vertical clutter). Ideally, maximize the number of small PRs, which could lead to 10 small PRs/commits for a big feature.

If not multiple small PRs, single PRs with <10 individually-meaningful (small) commits can acceptably be merged rather than squashed, to track the feature as a block. Balance of intent legibility vs history-keeping is an art anyway.

Since, in Github, the primary source of Git commit messages is the Github’s PR merge button, the PR merge message in Github should follow commit guidelines outlined above (though github UI makes it hard to stick to 72 chars).

Note that the PR merge message is usually pre-filled using git log of the commits sent for review, so PR descriptions should (ideally) just be the original content of git log of the commits: We should be able to trace the original commit message → PR description → git commit on master as all identical or at least very similar.

This is recommended because even though we could technically write silly “Haaaaaands” commit messages locally but still rework it to have eloquent commit messages at the end, I believe that “we play like we practice”. Encouraging a good commit message hygiene, even locally for branches, is a good habit that produces neat commit messages after merging.

Making meaningful commits on the first try isn’t obvious: Each PR sent for review should have only meaningful commits, so devs should get comfortable rebasing and reworking a branch into different commits, molding messages onthe way. Suggest going through the learn visual branching exercises to get confident. While experimenting, Github PRs should be marked as “Draft” to spare reviewers’ time.

There should be a link between PRs and (Jira/equivalent) ticket on most work, as most work should have tickets anyway for more business/technical context. Recommend PR titles have ticket prefix in square-brackets like [ABC-123] for linking.

Avoid prefixing commit messages with tickets on experimental work, as it could mean 20+ commits with same prefix, all to squash it back into 1. Consider instead using a Draft PR with ticket in title, and commit messages without ticket link. Using ticket as branch name is discouraged as it makes it impossible to understand what a branch is meant for without going online to look up tickets.

Remember that branching/commit guidelines are meant to facilitate work from future developers, who may be bisecting years-old code, searching for a bug, or an architectural decision that may need overturning due to new context. Be considerate of this future human. An old quote summarizes this as “Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. Code for readability”.

Versioning

Use either semantic versioning (libraries) or calendar versioning (apps), with git tags for marking the released commit.

To avoid “every commit on main branch may-or-may-not be a release, we don’t really know”, use git tags on main branch to signify release. Tag name is the new version after bump. Even though every main commit must be deployable, not every main commit is released = tagged = deployed.

Process for releasing goes:

Presence of a tag with known prefix (usually v) on a commit makes something a Release. Tags without prefix can be used for marking hotfixes or other history information that may be useful (last known working, in-the-field changes etc). Github’s “Release” feature may be used to attach changelog information to the git tag, as well as automate notifications for devs/users, but the git tag itself is the source of truth, the Github Release is only a beautified version of the tag.

Note that the above does not require releases to be git tags on the main branch, only that tags have a known prefix to mark as releases. This leeway technically enables off-main-branch releases, which can be a compliance nightmare, deal-breaker for some, or a life-saver for others. Consider if requiring releases to be cut from the main branch is a useful restriction for your project.

I personally am fond of annotated git tags (--annotate = -a flag of git tag) for adding more info about the release, such as a link to version’s documentation or release-tracking ticket/workboard as relevant, but this is not always needed nor worth it.

# Set BUMP variable to any of poetry-supported (major, minor, patch)
# or number (1.2.3 etc), see 'poetry version' docs for details
.PHONY: release
# Default the bump to a patch (v1.2.3 -> v1.2.4)
release: BUMP=patch
release:
# Set the new version Makefile variable after the version bump
	$(eval NEW_VERSION := $(shell poetry version --short ${BUMP}))
	$(eval TMP_CHANGELOG := $(shell mktemp))
	sed \
		"s/\(## \[Unreleased\]\)/\1\n\n## v${NEW_VERSION} - $(shell date +%Y-%m-%d)/" \
		CHANGELOG.md > ${TMP_CHANGELOG}
	mv --force ${TMP_CHANGELOG} CHANGELOG.md
	git add CHANGELOG.md pyproject.toml
	git commit -m "Bump to version v${NEW_VERSION}"
	git tag --annotate "v${NEW_VERSION}" \
		--message "Release v${NEW_VERSION}"
Code Snippet 1: Automation, via a Makefile entry for python projects. Running make release BUMP=patch creates a release commit + tag, finalizing the unreleased Changelog.

Following 12Factors “Build-Release-Run”: “Releases are an append-only ledger and a release cannot be mutated once it is created. Any change must create a new release.

Version naming technique: recommend usually either semantic or calendar versioning, depending if writing library or app. For a broader versioning spec, see PEP440 standard.

For libraries, semantic versioning explicits out the API as contract, bumping whenever contract changes as major/minor/patch. Note that this can be confusing for non-technical users (“a big feature was added but this is a minor release?” or “you said this was just a bugfix, why is this a major release?“).

For “apps” (programs NOT directly used as building block of another developer, consumed by users instead) calendar versioning is a good fit, using sequential numbers that increase always with flexibility between full Y-M-D numbering (eg. “v2021-01-10”), more lax Y.minor (eg. “2021.14” meaning 14th release of year 2021) and other combination like sprintnumber.minor (eg: “41.3” for 3rd release of Sprint 41, with a couple weeks between sprints) and variants like Y.M.minor (eg. “2021.1.6” for 6th version of January 2021, which is what Ubuntu does). Anything that has increasing numbers, and is sorteable, works.

In rare cases, a single version number is not enough: Dual versioning is a niche solution when addressing both techies and end users (or marketing dept) by keeping two different versions at once. See Android, which has “API levels” for techies, increasing differently from Android “versions” for users.

Changelog

Have a Changelog file in-repo, commonly updated as part of feature work. Use keepachangelog format. No git log dumps or conventional commits.

To avoid having only git log as reference of “what changed in new code” for user, have explicit changelogs aimed at (tech-averse) “customer”.

Use keepachangelog.com format: CHANGELOG.md file mandatory in repo, each release its section with release name + timestamp, and a line at the top explaining what versioning this project uses (see above).

Updating the changelog should be (mandatory?) part of PR workflow, in order to make the release step as simple as replacing “## Unreleased” section with the date of release. (See Snippet 1 for automating this)

Remember that knowing what was released is a vital part of IT operations: No one wants to be paged to fix a system at 3AM, and be answered “I dunno what changed, check the git log I guess”.

Recommend against use of Conventional Commits. These cause developers to confuse the targets of their words: release notes (changelogs) are for end users, and commits are for developers running git bisect in anger. The level of detail and what’s worthy of mention is not the same. Conventional commits, I believe, encourage sloppy commit messages, and are a premature optimization (via automation) of the development process, in a place that isn’t a performance bottleneck. Time spent with tools like commitizen is I believe better invested working with devs around improving commit message practices.

For a similar reason, recommend against PR and issue templates, as I found them to encourage sloppy “check the (compliance) box” thinking, rather than honest evaluation of the content of the commits.

Code review process

Every code change should be reviewed by another human. Use Google’s Code review guidelines. Remember a good code review uncovers not just bugs, but knowledge gaps, fixing information siloes. Use pre-commit, enforced in CI, to avoid talking of petty issues of like formatting, linting.

Code review is well known to improve code quality. It encourages standards, it shares information about recent code changes, spreading knowledge of the codebase to multiple people, reducing risk.

Recommend adopting the Google Code review guidelines, which solidify the role of reviewer and author (just be aware that “CL” is a Google-specific term that means “changelist” = PR/diff).

Remember that code reviews is not just to squash bugs, but to spread knowledge: both general teaching, and specific system knowledge. This is the time to ask questions and talk about architecture.

The most hardcore yardstick of PR review is that approving a PR should mean the reviewer can explain the code, and be ready to maintain it themselves (without help) going forward. Adapt this standard as needed to balance review speed and thoroughness of knowledge exchange.

Note though that code review is NOT the right place to discuss petty matters like formatting or end-of-file newlines, this stuff is best handed off to automation like formatter tools. These can usually be set in auto-fix mode, to avoid nagging messages, preferring automated fixes instead. Recommend use of the pre-commit tool to handle formatting and linting, enforced via CI to ensure compliance.

Conclusion

This is an overview of the views on Software Development Lifecycle (SDLC) I’ve held for the past few years, when building software professionally at a fairly small scale, with admitedly not a lot of focus on post-release deployment and support.

A lot of the practices described here can be found in my personal Python project template, as described in previous post about project templates.

As I’m working on a more “operational” role these days, I’ll keep checking if I change my views on anything above. I’ll report if anything changes in another post.