Automatic Commit Messages with GPT-3

Dec 27, 2022

How to generate your git commit messages with GPT-3 using the OpenAI API.

Minimal Example

The minimal example is really simple. These 28 lines of bash get the job done:

#!/usr/bin/env bash

# https://beta.openai.com/account/api-keys
OPENAI_TOKEN="..."

PROMPT="===== BEGIN GIT DIFF =====
$(git diff --cached --no-color)
===== END GIT DIFF =====

Write a git commit message for this diff.

Generated Commit Message:
"

PROPMT_JSONENCODED="$(echo -n "${PROMPT}" | jq -Rsa .)"

REQUEST_JSON="{
    \"model\": \"text-davinci-003\",
    \"prompt\": ${PROPMT_JSONENCODED},
    \"max_tokens\": 256,
    \"temperature\": 0.7
}"

RESPONSE_JSON="$(curl --silent --url "https://api.openai.com/v1/completions" --header "Authorization: Bearer ${OPENAI_TOKEN}" --header "Content-Type: application/json" --data-raw "${REQUEST_JSON}")"

RESPONSE_TEXT="$(printf "%s" "${RESPONSE_JSON}" | jq --raw-output ".choices[0].text")"

git commit -m "${RESPONSE_TEXT}"

Improvement: Trimming

We can improve this script with two kinds of trimming:

The message randomly starts with prefixes like Fix: or Refactor: and we could remove those.
The message could use trimming since it some times have leading white space.

Let us first trim those occasional prefixes:

function remove_prefix {
    local STRING="${1}"
    local PREFIX="${2}"
    echo "${STRING#"${PREFIX}"}"
}

FIRST_LINE="$(echo "${RESPONSE_TEXT}" | head -n 1)"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "Fix:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "fix:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "Refactor:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "refactor:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "Chore:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "chore:")"

Then trim the whitespace:

function trim {
    local var="${*}"

    # remove leading whitespace characters
    var="${var#"${var%%[![:space:]]*}"}"

    # remove trailing whitespace characters
    var="${var%"${var##*[![:space:]]}"}"

    echo -n "${var}"
}

FIRST_LINE="$(trim "${FIRST_LINE}")"

Improvement: Token Limit

The git diff might be too long and we must limit it to a certain length.

We are using the most powerful model text-davinci-003. Looking at https://beta.openai.com/docs/models/gpt-3 that model handles at most 4000 tokens. Those 4000 tokens include both the prompt and the completion so if we limit to 3000 tokens we should be fine:

FULL_DIFF="$(git diff --cached --no-color)"
TOKEN_LIMIT="3000"
CHAR_LIMIT=$((TOKEN_LIMIT * 36999 / 13914))
DIFF="${FULL_DIFF:0:${CHAR_LIMIT}}"

Tokens are something in-between characters and words. I came up with the ratio 36999 / 13914 by using https://beta.openai.com/tokenizer on a real git diff example.

Improvement: Haiku

We can also ask GPT-3 to write a Haiku describing the git diff and place that in the extended git message just by adding Then write a haiku that describes the changes made. to the prompt.

I tried other forms of lyrics such as "Heroic Ballad" and "K-Pop Song". After some usage I realized the output became too long. Too long for myself to read every time and too much clutter in the git log command output for colleagues.

The Haiku offers a short moment of meditation. You can take a deep breath. Briefly digest the changes you made.

Final Script

#!/usr/bin/env bash

# https://beta.openai.com/account/api-keys
OPENAI_TOKEN="..."

FULL_DIFF="$(git diff --cached --no-color)"
TOKEN_LIMIT="3000"
CHAR_LIMIT=$((TOKEN_LIMIT * 36999 / 13914))
DIFF="${FULL_DIFF:0:${CHAR_LIMIT}}"

PROMPT="===== BEGIN GIT DIFF =====
${DIFF}
===== END GIT DIFF =====

Write a git commit message for this diff.

Then write a haiku that describes the changes made.

Generated Commit Message:
"

PROPMT_JSONENCODED="$(echo -n "${PROMPT}" | jq -Rsa .)"

REQUEST_JSON="{
    \"model\": \"text-davinci-003\",
    \"prompt\": ${PROPMT_JSONENCODED},
    \"max_tokens\": 256,
    \"temperature\": 0.7
}"

RESPONSE_JSON="$(curl --silent --url "https://api.openai.com/v1/completions" --header "Authorization: Bearer ${OPENAI_TOKEN}" --header "Content-Type: application/json" --data-raw "${REQUEST_JSON}")"

RESPONSE_TEXT="$(printf "%s" "${RESPONSE_JSON}" | jq --raw-output ".choices[0].text")"

function trim {
    local var="${*}"

    # remove leading whitespace characters
    var="${var#"${var%%[![:space:]]*}"}"

    # remove trailing whitespace characters
    var="${var%"${var##*[![:space:]]}"}"

    echo -n "${var}"
}

function remove_prefix {
    local STRING="${1}"
    local PREFIX="${2}"
    echo "${STRING#"${PREFIX}"}"
}

FIRST_LINE="$(echo "${RESPONSE_TEXT}" | head -n 1)"
FIRST_LINE="$(trim "${FIRST_LINE}")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "Fix:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "fix:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "Refactor:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "refactor:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "Chore:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "chore:")"
FIRST_LINE="$(trim "${FIRST_LINE}")"

REST="$(echo "${RESPONSE_TEXT}" | tail -n +2)"
REST="$(trim "${REST}")"
REST="$(remove_prefix "${REST}" "Haiku:")"
REST="$(remove_prefix "${REST}" "haiku:")"
REST="$(trim "${REST}")"

RESULT="${FIRST_LINE}

Haiku:

${REST}

(Automatic commit message by OpenAI)"

git commit -m "${RESULT}"

Usage Examples

Here's a few usage examples based on real commits made to open source repositories:

https://github.com/flutter/flutter/commit/f11fbbafca425706f811e730c581eb0a3a723823

Fix entitlements for macos debug profile

Haiku:

Entitlements updated
Debug profile now moored tight
Sandbox can now sleep

(Automatic commit message by OpenAI)

https://github.com/home-assistant/core/commit/eae81547531cb830eff0eed791133a93c3735e61

Add fixtures and tests for cover node percentage

Haiku:

Cover child node added
Tests for opening, closing, stop
Position set to fifty

(Automatic commit message by OpenAI)

https://github.com/kubernetes/kubernetes/commit/ff7ba89b1cdcd22d31641951489f317ead1acc9f

Update mock files

Haiku:

Mocks were out of date
Quickly update them all now
Everything is fresh

(Automatic commit message by OpenAI)

Usage Considerations

For really large commits we'll use about 3000 tokens per API call. Looking at https://openai.com/api/pricing/ that would be $0.06 per commit message. I make 30 commits on a really busy day. So that's around 2$ daily for me in worst case.

The commit messages are not 100% accurate. Half of the time they are actually great. The other half they are outright lies that don't reflect the work done whatsoever. Depending on where you work this may or may not matter. Will your colleagues actually read your commit messages anyways?

The API call takes around 3 seconds. About the time it would take me to write an equally poor commit message. But with OpenAI I don't have to think and I get that nice Haiku as a bonus!