Automatic Commit Messages with GPT-3
How to generate your git commit messages with GPT-3 using the OpenAI API.
Minimal Example
The minimal example is really simple. These 28 lines of bash get the job done:
#!/usr/bin/env bash
# https://beta.openai.com/account/api-keys
OPENAI_TOKEN="..."
PROMPT="===== BEGIN GIT DIFF =====
$(git diff --cached --no-color)
===== END GIT DIFF =====
Write a git commit message for this diff.
Generated Commit Message:
"
PROPMT_JSONENCODED="$(echo -n "${PROMPT}" | jq -Rsa .)"
REQUEST_JSON="{
\"model\": \"text-davinci-003\",
\"prompt\": ${PROPMT_JSONENCODED},
\"max_tokens\": 256,
\"temperature\": 0.7
}"
RESPONSE_JSON="$(curl --silent --url "https://api.openai.com/v1/completions" --header "Authorization: Bearer ${OPENAI_TOKEN}" --header "Content-Type: application/json" --data-raw "${REQUEST_JSON}")"
RESPONSE_TEXT="$(printf "%s" "${RESPONSE_JSON}" | jq --raw-output ".choices[0].text")"
git commit -m "${RESPONSE_TEXT}"
Improvement: Trimming
We can improve this script with two kinds of trimming:
- The message randomly starts with prefixes like
Fix:
orRefactor:
and we could remove those. - The message could use trimming since it some times have leading white space.
Let us first trim those occasional prefixes:
function remove_prefix {
local STRING="${1}"
local PREFIX="${2}"
echo "${STRING#"${PREFIX}"}"
}
FIRST_LINE="$(echo "${RESPONSE_TEXT}" | head -n 1)"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "Fix:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "fix:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "Refactor:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "refactor:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "Chore:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "chore:")"
Then trim the whitespace:
function trim {
local var="${*}"
# remove leading whitespace characters
var="${var#"${var%%[![:space:]]*}"}"
# remove trailing whitespace characters
var="${var%"${var##*[![:space:]]}"}"
echo -n "${var}"
}
FIRST_LINE="$(trim "${FIRST_LINE}")"
Improvement: Token Limit
The git diff
might be too long and we must limit it to a certain length.
We are using the most powerful model text-davinci-003
. Looking at https://beta.openai.com/docs/models/gpt-3 that model handles at most 4000
tokens. Those 4000
tokens include both the prompt and the completion so if we limit to 3000
tokens we should be fine:
FULL_DIFF="$(git diff --cached --no-color)"
TOKEN_LIMIT="3000"
CHAR_LIMIT=$((TOKEN_LIMIT * 36999 / 13914))
DIFF="${FULL_DIFF:0:${CHAR_LIMIT}}"
Tokens are something in-between characters and words. I came up with the ratio 36999 / 13914
by using https://beta.openai.com/tokenizer on a real git diff
example.
Improvement: Haiku
We can also ask GPT-3 to write a Haiku describing the git diff
and place that in the extended git message just by adding Then write a haiku that describes the changes made.
to the prompt.
I tried other forms of lyrics such as "Heroic Ballad" and "K-Pop Song". After some usage I realized the output became too long. Too long for myself to read every time and too much clutter in the git log
command output for colleagues.
The Haiku offers a short moment of meditation. You can take a deep breath. Briefly digest the changes you made.
Final Script
#!/usr/bin/env bash
# https://beta.openai.com/account/api-keys
OPENAI_TOKEN="..."
FULL_DIFF="$(git diff --cached --no-color)"
TOKEN_LIMIT="3000"
CHAR_LIMIT=$((TOKEN_LIMIT * 36999 / 13914))
DIFF="${FULL_DIFF:0:${CHAR_LIMIT}}"
PROMPT="===== BEGIN GIT DIFF =====
${DIFF}
===== END GIT DIFF =====
Write a git commit message for this diff.
Then write a haiku that describes the changes made.
Generated Commit Message:
"
PROPMT_JSONENCODED="$(echo -n "${PROMPT}" | jq -Rsa .)"
REQUEST_JSON="{
\"model\": \"text-davinci-003\",
\"prompt\": ${PROPMT_JSONENCODED},
\"max_tokens\": 256,
\"temperature\": 0.7
}"
RESPONSE_JSON="$(curl --silent --url "https://api.openai.com/v1/completions" --header "Authorization: Bearer ${OPENAI_TOKEN}" --header "Content-Type: application/json" --data-raw "${REQUEST_JSON}")"
RESPONSE_TEXT="$(printf "%s" "${RESPONSE_JSON}" | jq --raw-output ".choices[0].text")"
function trim {
local var="${*}"
# remove leading whitespace characters
var="${var#"${var%%[![:space:]]*}"}"
# remove trailing whitespace characters
var="${var%"${var##*[![:space:]]}"}"
echo -n "${var}"
}
function remove_prefix {
local STRING="${1}"
local PREFIX="${2}"
echo "${STRING#"${PREFIX}"}"
}
FIRST_LINE="$(echo "${RESPONSE_TEXT}" | head -n 1)"
FIRST_LINE="$(trim "${FIRST_LINE}")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "Fix:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "fix:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "Refactor:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "refactor:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "Chore:")"
FIRST_LINE="$(remove_prefix "${FIRST_LINE}" "chore:")"
FIRST_LINE="$(trim "${FIRST_LINE}")"
REST="$(echo "${RESPONSE_TEXT}" | tail -n +2)"
REST="$(trim "${REST}")"
REST="$(remove_prefix "${REST}" "Haiku:")"
REST="$(remove_prefix "${REST}" "haiku:")"
REST="$(trim "${REST}")"
RESULT="${FIRST_LINE}
Haiku:
${REST}
(Automatic commit message by OpenAI)"
git commit -m "${RESULT}"
Usage Examples
Here's a few usage examples based on real commits made to open source repositories:
https://github.com/flutter/flutter/commit/f11fbbafca425706f811e730c581eb0a3a723823
Fix entitlements for macos debug profile
Haiku:
Entitlements updated
Debug profile now moored tight
Sandbox can now sleep
(Automatic commit message by OpenAI)
https://github.com/home-assistant/core/commit/eae81547531cb830eff0eed791133a93c3735e61
Add fixtures and tests for cover node percentage
Haiku:
Cover child node added
Tests for opening, closing, stop
Position set to fifty
(Automatic commit message by OpenAI)
https://github.com/kubernetes/kubernetes/commit/ff7ba89b1cdcd22d31641951489f317ead1acc9f
Update mock files
Haiku:
Mocks were out of date
Quickly update them all now
Everything is fresh
(Automatic commit message by OpenAI)
Usage Considerations
For really large commits we'll use about 3000 tokens per API call. Looking at https://openai.com/api/pricing/ that would be $0.06
per commit message. I make 30 commits on a really busy day. So that's around 2$
daily for me in worst case.
The commit messages are not 100% accurate. Half of the time they are actually great. The other half they are outright lies that don't reflect the work done whatsoever. Depending on where you work this may or may not matter. Will your colleagues actually read your commit messages anyways?
The API call takes around 3 seconds. About the time it would take me to write an equally poor commit message. But with OpenAI I don't have to think and I get that nice Haiku as a bonus!