Jiby's toolbox

Jb Doyon’s personal website

Git diff for prose

Posted on — Oct 6, 2019

When writing long sentences in documentation repositories, git tends to show really unhelpful diffs. They are unreadable because long lines aren’t broken, which hides edits happening towards end of line. A colleague of mine asked me if git couldn’t be configured to make this sort of thing more obvious. Challenge accepted!

Figure 1: Can you spot the edit made in a long line of text?

Figure 1: Can you spot the edit made in a long line of text?

Kaushal Modi’s blog post on git diff for minified JS and CSS inspired this idea for all you prose lovers. Essentially we’ll tell git to preprocess files with a command that splits text by sentences before running git diff.

To do this, we first create a script to replace period+whitespace with newlines. This is a good enough heuristic to distinguish sentences, but feel free to come up with a more appropriate one (fellow Americans, I heard you might want two spaces after your full stops).

sed -r -e 's/\. +/.\n/g' $*
Code Snippet 1: New "breaksentences" script

Once added to $PATH (and checking it by running breaksentences myfile.txt= the script can be added as a “diff driver” in git config (either globally in ~/.gitconfig or for only a specific repo via .git/config). Once the driver is defined, it can be used in a .gitattributes file. All thanks to the magic of gitattributes(5) and git-config(5), and their concept of “diff drivers”.

[diff "sentences"]
	textconv = breaksentences
Code Snippet 2: Defining a new diff driver in ~/.gitconfig
*.md diff=sentences
Code Snippet 3: Using new diff driver in .gitattributes

Feel free to edit the wildcard to match more adequately than md files! Now your diffs should now be looking nice!

Figure 2: Editing in a long line of text is a chore (before)

Figure 2: Editing in a long line of text is a chore (before)

Figure 3: Clearer diff, yey! (after)

Figure 3: Clearer diff, yey! (after)

Remember that our modification doesn’t apply any changes to the files, only to the diff tool, and be aware that diff drivers can interfere with interactive tools using diff output to stage files. This can be a deal-breaker for some, but it’s still neat to learn about git magic. Powerful stuff.