Vim Digraphs: The Unnecessarily Detailed Guide

In Vim, in Insert mode, if you type Ctrl-K followed by two characters, you can insert a special Unicode character corresponding to those two characters. The two-character combo is often easier to remember than other codes. These are called digraphs.

For example, Ctrl-K12 will insert ½.

Everything below this is unnecessary detail.

What are all the digraphs?

I used to think the big table :help digraphs-table at the bottom of Vim’s help page for digraphs was exhaustive. As the page explains, the mnemonics are based on RFC1345, an Internet standard that provides two-character mnemonics for various characters.

To locate ourselves on a timeline: this RFC is from June 1992. To define what a “character” is, it refers to “ISO 2DIS 10646”. That’s a version1 of ISO 10646, a standard which had sort of competed with Unicode for a while before they became unified in 1991, shortly before our RFC. Of course, these standard(s) hadn’t “won” at the time; single-byte encodings with all kinds of competing codepages were all the rage still. I assume this is the time about which Joel Spolsky wrote that Americans’ résumés would arrive in Israel as rגsumגs.

However, at some point, I accidentally discovered that you can use the digraph 22 for the superscript 2 character, ². This is not in RFC1345 or on that help page! It does get printed by :digraphs, but it’s harder to find, and since it’s easier to type than the help page’s 2S, I wanted to learn what other digraphs there were that I didn’t know about.

In the Vim source code (or Neovim), the 22 digraph and several others have a comment saying, “Vim 5.x compatible”. Vim 5.0 was released in February 1998. But Vim had digraphs before then; the comment means that the digraphs without the comment were added after Vim 5.x: in September 2001, when Vim 6.0 was released. Vim’s documentation notes as much, in a short section under :help incompatible-6:

The default digraphs now correspond to RFC1345. This is very different from what was used in Vim 5.x.

(Understandably, this section is not in NeoVim’s docs.)

To track down our mystery digraphs, we need to go further back in Vim history. This is a little tricky. The (now-)official Vim git repo only goes as far back as 7.02; even the official Vim source mirrors only go back to 3.0.

Fortunately, the vim-history GitHub repo has older versions, including version notes3 saying that digraphs were added in Vim 1.18. No release date for 1.18 is documented, but the repo lists release dates for 1.17 in April 1992 and for 1.27 in April 1993, so it seems likely that 1.18 came out mid-1992. Interestingly, the version notes say that “CTRL-K for entering digraphs” was only added in version 1.25. Before then, you could only use digraphs with the “one character, backspace, the other character” song and dance (which might have been helpfully analogous to a typewriter for some users), and only if digraphs were specifically enabled.4

Anyway, the 1.24 version of digraphs is the earliest source code I could find, and there it is, the 22 digraph for ².

While we’re randomly rabbit-holing

I couldn’t help but notice the contributors on top of the file:

Bram Moolenaar          [email protected]
Tim Thompson            twitch!tjt
Tony Andrews            onecom!wldrdg!tony 
G. R. (Fred) Walter     watmath!watcgl!grwalter

You know what other standard we take for granted today hadn’t “won” yet? Email addresses. Bram Moolenaar had an email address, but the other three contributors’ contact information was provided as UUCP bang paths.

A short history of digraphs and code pages

If you trace the file a bit further, you find that the next commit we have, 1.27, switches to a file digraph.c.uue. I realized shortly after actually typing this out that I don’t actually know whether the file was provided as such back in 1993 or whether this was the archivist’s decision. (The repository history is clearly post-hoc since Git itself came out in 2005.) I considered checking every commit to write down every digraph change, but quickly decided that that would be far too time-consuming. Let’s just check in on digraph.c at the start of each major version.

  • In v2.0 (December 1993) Vim can be compiled with “standard MSDOS digraphs” or “standard ISO digraphs”. It looks like “standard MSDOS digraphs” use the same mnemonics applied to Code page 437.

  • In v3.0 (August 1994) the digraphs are mostly the same as v2.0. (I couldn’t figure out why git blame shows a large change.)

  • In v4.0 (May 1996) Vim gained “HPUX digraphs” for HP Roman.

  • In v5.0 (February 1998), under an ATARI MiNT preprocessor guard, Vim gained “ATARI digraphs” for the Atari ST character set.

  • In v6.0 (August 2001), Vim gained digraphs for EBCDIC and MACOS (Mac OS Roman), in addition to the modern RFC1345 digraphs. The old digraphs that didn’t conflict with the new ones were generally kept around (though see below), but all old digraphs were still available under an OLD_DIGRAPHS preprocessor guard. There were also a few tweaks and additions to those old default digraphs in v5.2: AA/aa were added next to A@/a@ for Å/å, and a few digraphs for characters where ISO 8859-15 diverged from ISO 8859-1 were added.

  • In v7.0 (May 2006) the digraphs are mostly the same as v6.0. From this point on, though, Vim started to shed support for operating systems and their code pages as they faded into obscurity. (Also, we have real git commits and it’s easier to find major changes.) Vim dropped MS-DOS in 7.4.1399, MACOS in 8.1.0805, MiNT in 8.2.1215, EBCDIC in 8.2.4273, and OLD_DIGRAPHS and HPUX_DIGRAPHS in 9.0.0328. On the other hand, it also gained new default digraphs in tiny batches:

    • =e for € EURO SIGN in 7.0017,
    • Eu for € EURO SIGN (again) in 7.0146,
    • =R and =P for ₽ RUBLE SIGN (called ROUBLE SIGN in the source code) in 7.4.335,
    • ,. for … HORIZONTAL ELLIPSIS in 8.0.0062,
    • W`, w`, Y`, and y` for some letters with graves (Ẁ, ẁ, Ỳ, ỳ) in 8.0.0749,
    • oo for • BULLET in 8.2.1635,
    • 4' for ⁗ QUADRUPLE PRIME in 9.0.2056, and
    • .= for ≐ APPROACHES THE LIMIT in 9.1.1065.

    That last one was committed in February this year, after I started writing this post. I have a few dozen custom digraphs in my vimrc file; maybe only a handful are digraphs I could imagine being broadly useful and memorable, but I have never once considered the option of trying to upstream any of them to every Vim user ever. I admire the hustle.

I’ve given up on documenting the digraphs for every now-dead codepage, not only because there are so many but because it would often be anachronistic to describe “what character” each digraph was for in modern terms. Unicode still hadn’t won! In pursuit of economy, several code pages deliberately conflate the German eszett ß with the Greek beta β. HPUX had digraphs L- and L= for one-bar £ and two-bar ₤ pound signs, even though today they’re considered typographical variations — in fact, the HP Roman codepage is the reason Unicode has a separate ₤ character.5 MACOS had the AP digraph for the Apple logo, which is a private-use character, U+F8FF, today. Are you on a macOS device? 

For a similar rabbit hole, you can read about the “small house” in Code Page 437 on Glyph Drawing Club.

Every old Vim digraph

Here are the final “old Vim digraphs” before they were removed in 9.0.0328, and each of their ultimate fates. I also provide the new, RFC-compliant digraph for the character each old digraph was for. The unintuitive digraphs, like e= mapping to U+00A4, are related to the ISO 8859-15 divergence.

Surprisingly, four digraphs — "", rO, ,,, and i" — were not preserved even though they don’t conflict with any RFC digraphs, and even though some similar digraphs (cO for ©, a" for ä, e" for ë, etc.) were preserved.

Digraph Old meaning New meaning RFC
~! U+00A1 ¡ 🧊 preserved !I
c| U+00A2 ¢ 🧊 preserved Ct
$$ U+00A3 £ 🧊 preserved Pd
e= U+00A4 ¤ 🔀 remapped U+0435 е Cu
ox U+00A4 ¤ 🧊 preserved Cu
Y- U+00A5 ¥ 🧊 preserved Ye
|| U+00A6 ¦ 🧊 preserved BB
pa U+00A7 § 🔀 remapped U+3071 SE
"" U+00A8 ¨ ⛔ removed ':
cO U+00A9 © 🧊 preserved Co
a- U+00AA ª 🔀 remapped U+0101 ā -a
<< U+00AB « ✅ RFC-compliant
-, U+00AC ¬ 🧊 preserved NO
-- U+00AD ­ ✅ RFC-compliant
rO U+00AE ® ⛔ removed Rg
-= U+00AF ¯ 🧊 preserved 'm
~o U+00B0 ° 🧊 preserved DG
+- U+00B1 ± ✅ RFC-compliant
22 U+00B2 ² 🧊 preserved 2S
33 U+00B3 ³ 🧊 preserved 3S
'' U+00B4 ´ ✅ RFC-compliant
ju U+00B5 µ 🔀 remapped U+044E ю My
pp U+00B6 🧊 preserved PI
~. U+00B7 · 🧊 preserved .M
,, U+00B8 ¸ ⛔ removed ',
11 U+00B9 ¹ 🧊 preserved 1S
o- U+00BA º 🔀 remapped U+014D ō -o
>> U+00BB » ✅ RFC-compliant
14 U+00BC ¼ ✅ RFC-compliant
12 U+00BD ½ ✅ RFC-compliant
34 U+00BE ¾ ✅ RFC-compliant
~? U+00BF ¿ 🧊 preserved ?I
A` U+00C0 À 🧊 preserved A!
A' U+00C1 Á ✅ RFC-compliant
A^ U+00C2 Â 🧊 preserved A>
A~ U+00C3 Ã 🧊 preserved A?
A" U+00C4 Ä 🧊 preserved A:
A@ U+00C5 Å 🧊 preserved AA
AA U+00C5 Å ✅ RFC-compliant
AE U+00C6 Æ ✅ RFC-compliant
C, U+00C7 Ç ✅ RFC-compliant
E` U+00C8 È 🧊 preserved E!
E' U+00C9 É ✅ RFC-compliant
E^ U+00CA Ê 🧊 preserved E>
E" U+00CB Ë 🧊 preserved E:
I` U+00CC Ì 🧊 preserved I!
I' U+00CD Í ✅ RFC-compliant
I^ U+00CE Î 🧊 preserved I>
I" U+00CF Ï 🧊 preserved I:
D- U+00D0 Ð ✅ RFC-compliant
N~ U+00D1 Ñ 🧊 preserved N?
O` U+00D2 Ò 🧊 preserved O!
O' U+00D3 Ó ✅ RFC-compliant
O^ U+00D4 Ô 🧊 preserved O>
O~ U+00D5 Õ 🧊 preserved O?
O" U+00D6 Ö 🔀 remapped U+0150 Ő O:
/\ U+00D7 × 🧊 preserved *X
OE U+00D7 × 🔀 remapped U+0152 Œ *X
O/ U+00D8 Ø ✅ RFC-compliant
U` U+00D9 Ù 🧊 preserved U!
U' U+00DA Ú ✅ RFC-compliant
U^ U+00DB Û 🧊 preserved U>
U" U+00DC Ü 🔀 remapped U+0170 Ű U:
Y' U+00DD Ý ✅ RFC-compliant
Ip U+00DE Þ 🧊 preserved TH
ss U+00DF ß ✅ RFC-compliant
a` U+00E0 à 🧊 preserved a!
a' U+00E1 á ✅ RFC-compliant
a^ U+00E2 â 🧊 preserved a>
a~ U+00E3 ã 🧊 preserved a?
a" U+00E4 ä 🧊 preserved a:
a@ U+00E5 å 🧊 preserved aa
aa U+00E5 å ✅ RFC-compliant
ae U+00E6 æ ✅ RFC-compliant
c, U+00E7 ç ✅ RFC-compliant
e` U+00E8 è 🧊 preserved e!
e' U+00E9 é ✅ RFC-compliant
e^ U+00EA ê 🧊 preserved e>
e" U+00EB ë 🧊 preserved e:
i` U+00EC ì 🧊 preserved i!
i' U+00ED í ✅ RFC-compliant
i^ U+00EE î 🧊 preserved i>
i" U+00EF ï ⛔ removed i:
d- U+00F0 ð ✅ RFC-compliant
n~ U+00F1 ñ 🧊 preserved n?
o` U+00F2 ò 🧊 preserved o!
o' U+00F3 ó ✅ RFC-compliant
o^ U+00F4 ô 🧊 preserved o>
o~ U+00F5 õ 🧊 preserved o?
o" U+00F6 ö 🔀 remapped U+0151 ő o:
:- U+00F7 ÷ ⛔/✅ effectively RFC-compliant -:
oe U+00F7 ÷ 🔀 remapped U+0153 œ -:
o/ U+00F8 ø ✅ RFC-compliant
u` U+00F9 ù 🧊 preserved u!
u' U+00FA ú ✅ RFC-compliant
u^ U+00FB û 🧊 preserved u>
u" U+00FC ü 🔀 remapped U+0171 ű u:
y' U+00FD ý ✅ RFC-compliant
ip U+00FE þ ⛔/🔀 removed (RFC pi = ぴ) th
y" U+00FF ÿ 🧊 preserved y:

All remaining default digraphs that aren’t from the RFC were added after 7.0 and mentioned above, though how Vim’s digraphs help page discusses them varies. That page documents a subset of those digraphs in the big :help digraph-table table that I originally hoped was exhaustive, and calls out a different subset in the prose as additions to RFC1345. Though, 9.1.1065, the last commit to add a digraph, also revises the prose to say that the default digraphs come from “RFC1345 mnemonics (with some additions)”, so arguably that base is now fully covered.

Digraph Meaning Doc in table? Doc in prose?
W` U+1E80 🚫 no 🚫 no
w` U+1E81 🚫 no 🚫 no
Y` U+1EF2 🚫 no 🚫 no
y` U+1EF3 🚫 no 🚫 no
oo U+2022 ✅ yes 🚫 no
,. U+2026 ✅ yes 🚫 no
4' U+2057 ✅ yes ✅ yes
=e U+20AC 🚫 no ✅ yes
Eu U+20AC ✅ yes ✅ yes
=P U+20BD ✅ yes ✅ yes
=R U+20BD ✅ yes ✅ yes
.= U+2250 ✅ yes 🤷

How do you implement digraphs in the wild?

(Really stretching the concept of relevance here.)

I have a problem where I will have tiny conveniences on some computers that I own and then get annoyed that I don’t have them on other devices or operating systems. I’m also enough of a web dev that I can fix this. Digraphs are easy enough: In a text area, capture the keydown event. If it’s Ctrl-K, activate digraph mode and capture and preventDefault the next two keydown events. After the second such event, look up the digraph and append it.

textarea.addEventListener('keydown', function(e) {
    if (e.ctrlKey && e.key === 'k') {
        e.preventDefault();
        isDigraphMode = true;
        // ...
    }
    if (isDigraphMode) {
        // look some stuff up with previous/current e.key
    }
});

Problem: I can’t Ctrl-K on my phone. Solution: Pick a different typeable key, say the backslash. (In practice I made it customizable.)

textarea.addEventListener('keydown', function(e) {
    if (e.ctrlKey && e.key === 'k'
            || (!isDigraphMode && e.key === "\\")) {
        e.preventDefault();
        isDigraphMode = true;
        // ...
    }
    if (isDigraphMode) {
        // look some stuff up with e.key and any buffered presses
    }
});

Surprisingly, although this works on my computer and a few other desktop and mobile browsers I checked, it doesn’t work on Android Chrome, where e.key is always "Unidentified". Some StackOverflow issues like Why is “Unidentified” returned on keyboard input on mobile? point to issue 118639 on the Chromium bug tracker, but it denies access for me. According to a Wayback Machine snapshot, this bug was opened in March 2012 and a fix was merged in August 2014, but then the fix was reverted in April 2015 and the bug was marked as WontFix.

Too many non-western-keyboard use-cases depend on having “compose” key-codes instead of guessed “proper” ones. We’ve tried fixes that compromise but they never get all the edge-cases.

The bottom line is that if you want your keyboard-knowledgeable web page to work with Chrome, you’re going to have to be tolerant of IME’s idiosyncrasies with regard to key-codes.

The best resolution I found is to capture the beforeinput event instead, whose meaning maybe is simpler in a world with so many input methods — “if you would insert this text, don’t”.

textarea.addEventListener('beforeinput', function(e) {
    if (e.data === "\\") {
        e.preventDefault();
        isDigraphMode = true;
        // ...
    }
});

Anyway I now have a digraphs.html with an obnoxious number of custom digraphs I can use anywhere I have an internet connection and a modern JavaScript-enabled browser.

if you liked this post, click to make an invisible number go up: