In Vim, in Insert mode, if you type Ctrl-K followed by two characters, you can insert a special Unicode character corresponding to those two characters. The two-character combo is often easier to remember than other codes. These are called digraphs.
For example, Ctrl-K12 will insert
½
.
Everything below this is unnecessary detail.
What are all the digraphs?
I used to think the big table :help
digraphs-table
at the bottom of Vim’s help page for digraphs
was exhaustive. As the page explains, the mnemonics are based on RFC1345, an
Internet standard that provides two-character mnemonics for various
characters.
To locate ourselves on a timeline: this RFC is from June 1992. To define what a “character” is, it refers to “ISO 2DIS 10646”. That’s a version1 of ISO 10646, a standard which had sort of competed with Unicode for a while before they became unified in 1991, shortly before our RFC. Of course, these standard(s) hadn’t “won” at the time; single-byte encodings with all kinds of competing codepages were all the rage still. I assume this is the time about which Joel Spolsky wrote that Americans’ résumés would arrive in Israel as rגsumגs.
However, at some point, I accidentally discovered that you can use
the digraph 22
for the superscript 2 character,
²
. This is not in RFC1345 or on that help page! It does get
printed by :digraphs
, but it’s harder to find, and since
it’s easier to type than the help page’s 2S
, I wanted to
learn what other digraphs there were that I didn’t know about.
In the Vim source
code (or Neovim),
the 22
digraph and several others have a comment saying,
“Vim 5.x compatible”. Vim 5.0 was released in February 1998. But Vim had
digraphs before then; the comment means that the digraphs
without the comment were added after Vim 5.x: in
September 2001, when Vim 6.0 was released. Vim’s documentation notes as
much, in a short
section under :help incompatible-6
:
The default digraphs now correspond to RFC1345. This is very different from what was used in Vim 5.x.
(Understandably, this section is not in NeoVim’s docs.)
To track down our mystery digraphs, we need to go further back in Vim history. This is a little tricky. The (now-)official Vim git repo only goes as far back as 7.02; even the official Vim source mirrors only go back to 3.0.
Fortunately, the vim-history GitHub repo has older versions, including version notes3 saying that digraphs were added in Vim 1.18. No release date for 1.18 is documented, but the repo lists release dates for 1.17 in April 1992 and for 1.27 in April 1993, so it seems likely that 1.18 came out mid-1992. Interestingly, the version notes say that “CTRL-K for entering digraphs” was only added in version 1.25. Before then, you could only use digraphs with the “one character, backspace, the other character” song and dance (which might have been helpfully analogous to a typewriter for some users), and only if digraphs were specifically enabled.4
Anyway, the 1.24
version of digraphs is the earliest source code I could find, and
there it is, the 22
digraph for ²
.
While we’re randomly rabbit-holing
I couldn’t help but notice the contributors on top of the file:
Bram Moolenaar [email protected]
Tim Thompson twitch!tjt
Tony Andrews onecom!wldrdg!tony
G. R. (Fred) Walter watmath!watcgl!grwalter
You know what other standard we take for granted today hadn’t “won” yet? Email addresses. Bram Moolenaar had an email address, but the other three contributors’ contact information was provided as UUCP bang paths.
A short history of digraphs and code pages
If you trace the file a bit further, you find that the next
commit we have, 1.27, switches to a file digraph.c.uue
.
I realized shortly after actually typing this out that I don’t actually
know whether the file was provided as such back in 1993 or whether this
was the archivist’s decision. (The repository history is clearly
post-hoc since Git itself came out in 2005.) I considered checking every
commit to write down every digraph change, but quickly decided that that
would be far too time-consuming. Let’s just check in on
digraph.c
at the start of each major version.
In v2.0 (December 1993) Vim can be compiled with “standard MSDOS digraphs” or “standard ISO digraphs”. It looks like “standard MSDOS digraphs” use the same mnemonics applied to Code page 437.
In v3.0 (August 1994) the digraphs are mostly the same as v2.0. (I couldn’t figure out why git blame shows a large change.)
In v5.0 (February 1998), under an ATARI MiNT preprocessor guard, Vim gained “ATARI digraphs” for the Atari ST character set.
In v6.0 (August 2001), Vim gained digraphs for EBCDIC and MACOS (Mac OS Roman), in addition to the modern RFC1345 digraphs. The old digraphs that didn’t conflict with the new ones were generally kept around (though see below), but all old digraphs were still available under an OLD_DIGRAPHS preprocessor guard. There were also a few tweaks and additions to those old default digraphs in v5.2:
AA
/aa
were added next toA@
/a@
for Å/å, and a few digraphs for characters where ISO 8859-15 diverged from ISO 8859-1 were added.In v7.0 (May 2006) the digraphs are mostly the same as v6.0. From this point on, though, Vim started to shed support for operating systems and their code pages as they faded into obscurity. (Also, we have real git commits and it’s easier to find major changes.) Vim dropped MS-DOS in 7.4.1399, MACOS in 8.1.0805, MiNT in 8.2.1215, EBCDIC in 8.2.4273, and OLD_DIGRAPHS and HPUX_DIGRAPHS in 9.0.0328. On the other hand, it also gained new default digraphs in tiny batches:
=e
for € EURO SIGN in 7.0017,Eu
for € EURO SIGN (again) in 7.0146,=R
and=P
for ₽ RUBLE SIGN (called ROUBLE SIGN in the source code) in 7.4.335,,.
for … HORIZONTAL ELLIPSIS in 8.0.0062,W`
,w`
,Y`
, andy`
for some letters with graves (Ẁ, ẁ, Ỳ, ỳ) in 8.0.0749,oo
for • BULLET in 8.2.1635,4'
for ⁗ QUADRUPLE PRIME in 9.0.2056, and.=
for ≐ APPROACHES THE LIMIT in 9.1.1065.
That last one was committed in February this year, after I started writing this post. I have a few dozen custom digraphs in my vimrc file; maybe only a handful are digraphs I could imagine being broadly useful and memorable, but I have never once considered the option of trying to upstream any of them to every Vim user ever. I admire the hustle.
I’ve given up on documenting the digraphs for every now-dead
codepage, not only because there are so many but because it would often
be anachronistic to describe “what character” each digraph was for in
modern terms. Unicode still hadn’t won! In pursuit of economy, several
code pages deliberately conflate the German eszett ß with the Greek beta
β. HPUX had digraphs L-
and L=
for one-bar £
and two-bar ₤ pound signs, even though today they’re considered
typographical variations — in fact, the HP Roman codepage is the
reason Unicode has a separate ₤ character.5
MACOS had the AP
digraph for the Apple logo, which is a
private-use character, U+F8FF, today. Are you on a macOS device?
For a similar rabbit hole, you can read about the “small house” in Code Page 437 on Glyph Drawing Club.
Every old Vim digraph
Here are the final “old Vim digraphs” before they were removed in
9.0.0328, and each of their ultimate fates. I also provide the new,
RFC-compliant digraph for the character each old digraph was for. The
unintuitive digraphs, like e=
mapping to
U+00A4
, are related to the ISO 8859-15
divergence.
Surprisingly, four digraphs — ""
, rO
,
,,
, and i"
— were not preserved even though
they don’t conflict with any RFC digraphs, and even though some similar
digraphs (cO
for ©, a"
for ä, e"
for ë, etc.) were preserved.
Digraph | Old meaning | New meaning | RFC |
---|---|---|---|
~! |
U+00A1 ¡ |
🧊 preserved | !I |
c| |
U+00A2 ¢ |
🧊 preserved | Ct |
$$ |
U+00A3 £ |
🧊 preserved | Pd |
e= |
U+00A4 ¤ |
🔀 remapped U+0435 е |
Cu |
ox |
U+00A4 ¤ |
🧊 preserved | Cu |
Y- |
U+00A5 ¥ |
🧊 preserved | Ye |
|| |
U+00A6 ¦ |
🧊 preserved | BB |
pa |
U+00A7 § |
🔀 remapped U+3071 ぱ |
SE |
"" |
U+00A8 ¨ |
⛔ removed | ': |
cO |
U+00A9 © |
🧊 preserved | Co |
a- |
U+00AA ª |
🔀 remapped U+0101 ā |
-a |
<< |
U+00AB « |
✅ RFC-compliant | |
-, |
U+00AC ¬ |
🧊 preserved | NO |
-- |
U+00AD |
✅ RFC-compliant | |
rO |
U+00AE ® |
⛔ removed | Rg |
-= |
U+00AF ¯ |
🧊 preserved | 'm |
~o |
U+00B0 ° |
🧊 preserved | DG |
+- |
U+00B1 ± |
✅ RFC-compliant | |
22 |
U+00B2 ² |
🧊 preserved | 2S |
33 |
U+00B3 ³ |
🧊 preserved | 3S |
'' |
U+00B4 ´ |
✅ RFC-compliant | |
ju |
U+00B5 µ |
🔀 remapped U+044E ю |
My |
pp |
U+00B6 ¶ |
🧊 preserved | PI |
~. |
U+00B7 · |
🧊 preserved | .M |
,, |
U+00B8 ¸ |
⛔ removed | ', |
11 |
U+00B9 ¹ |
🧊 preserved | 1S |
o- |
U+00BA º |
🔀 remapped U+014D ō |
-o |
>> |
U+00BB » |
✅ RFC-compliant | |
14 |
U+00BC ¼ |
✅ RFC-compliant | |
12 |
U+00BD ½ |
✅ RFC-compliant | |
34 |
U+00BE ¾ |
✅ RFC-compliant | |
~? |
U+00BF ¿ |
🧊 preserved | ?I |
A` |
U+00C0 À |
🧊 preserved | A! |
A' |
U+00C1 Á |
✅ RFC-compliant | |
A^ |
U+00C2 Â |
🧊 preserved | A> |
A~ |
U+00C3 Ã |
🧊 preserved | A? |
A" |
U+00C4 Ä |
🧊 preserved | A: |
A@ |
U+00C5 Å |
🧊 preserved | AA |
AA |
U+00C5 Å |
✅ RFC-compliant | |
AE |
U+00C6 Æ |
✅ RFC-compliant | |
C, |
U+00C7 Ç |
✅ RFC-compliant | |
E` |
U+00C8 È |
🧊 preserved | E! |
E' |
U+00C9 É |
✅ RFC-compliant | |
E^ |
U+00CA Ê |
🧊 preserved | E> |
E" |
U+00CB Ë |
🧊 preserved | E: |
I` |
U+00CC Ì |
🧊 preserved | I! |
I' |
U+00CD Í |
✅ RFC-compliant | |
I^ |
U+00CE Î |
🧊 preserved | I> |
I" |
U+00CF Ï |
🧊 preserved | I: |
D- |
U+00D0 Ð |
✅ RFC-compliant | |
N~ |
U+00D1 Ñ |
🧊 preserved | N? |
O` |
U+00D2 Ò |
🧊 preserved | O! |
O' |
U+00D3 Ó |
✅ RFC-compliant | |
O^ |
U+00D4 Ô |
🧊 preserved | O> |
O~ |
U+00D5 Õ |
🧊 preserved | O? |
O" |
U+00D6 Ö |
🔀 remapped U+0150 Ő |
O: |
/\ |
U+00D7 × |
🧊 preserved | *X |
OE |
U+00D7 × |
🔀 remapped U+0152 Œ |
*X |
O/ |
U+00D8 Ø |
✅ RFC-compliant | |
U` |
U+00D9 Ù |
🧊 preserved | U! |
U' |
U+00DA Ú |
✅ RFC-compliant | |
U^ |
U+00DB Û |
🧊 preserved | U> |
U" |
U+00DC Ü |
🔀 remapped U+0170 Ű |
U: |
Y' |
U+00DD Ý |
✅ RFC-compliant | |
Ip |
U+00DE Þ |
🧊 preserved | TH |
ss |
U+00DF ß |
✅ RFC-compliant | |
a` |
U+00E0 à |
🧊 preserved | a! |
a' |
U+00E1 á |
✅ RFC-compliant | |
a^ |
U+00E2 â |
🧊 preserved | a> |
a~ |
U+00E3 ã |
🧊 preserved | a? |
a" |
U+00E4 ä |
🧊 preserved | a: |
a@ |
U+00E5 å |
🧊 preserved | aa |
aa |
U+00E5 å |
✅ RFC-compliant | |
ae |
U+00E6 æ |
✅ RFC-compliant | |
c, |
U+00E7 ç |
✅ RFC-compliant | |
e` |
U+00E8 è |
🧊 preserved | e! |
e' |
U+00E9 é |
✅ RFC-compliant | |
e^ |
U+00EA ê |
🧊 preserved | e> |
e" |
U+00EB ë |
🧊 preserved | e: |
i` |
U+00EC ì |
🧊 preserved | i! |
i' |
U+00ED í |
✅ RFC-compliant | |
i^ |
U+00EE î |
🧊 preserved | i> |
i" |
U+00EF ï |
⛔ removed | i: |
d- |
U+00F0 ð |
✅ RFC-compliant | |
n~ |
U+00F1 ñ |
🧊 preserved | n? |
o` |
U+00F2 ò |
🧊 preserved | o! |
o' |
U+00F3 ó |
✅ RFC-compliant | |
o^ |
U+00F4 ô |
🧊 preserved | o> |
o~ |
U+00F5 õ |
🧊 preserved | o? |
o" |
U+00F6 ö |
🔀 remapped U+0151 ő |
o: |
:- |
U+00F7 ÷ |
⛔/✅ effectively RFC-compliant | -: |
oe |
U+00F7 ÷ |
🔀 remapped U+0153 œ |
-: |
o/ |
U+00F8 ø |
✅ RFC-compliant | |
u` |
U+00F9 ù |
🧊 preserved | u! |
u' |
U+00FA ú |
✅ RFC-compliant | |
u^ |
U+00FB û |
🧊 preserved | u> |
u" |
U+00FC ü |
🔀 remapped U+0171 ű |
u: |
y' |
U+00FD ý |
✅ RFC-compliant | |
ip |
U+00FE þ |
⛔/🔀 removed (RFC pi = ぴ) |
th |
y" |
U+00FF ÿ |
🧊 preserved | y: |
All remaining default digraphs that aren’t from the RFC were added
after 7.0 and mentioned above, though how Vim’s digraphs help page
discusses them varies. That page documents a subset of those digraphs in
the big :help digraph-table
table that I originally hoped
was exhaustive, and calls out a different subset in the prose as
additions to RFC1345. Though, 9.1.1065,
the last commit to add a digraph, also revises the prose to say that the
default digraphs come from “RFC1345 mnemonics (with some additions)”, so
arguably that base is now fully covered.
Digraph | Meaning | Doc in table? | Doc in prose? |
---|---|---|---|
W` |
U+1E80 Ẁ |
🚫 no | 🚫 no |
w` |
U+1E81 ẁ |
🚫 no | 🚫 no |
Y` |
U+1EF2 Ỳ |
🚫 no | 🚫 no |
y` |
U+1EF3 ỳ |
🚫 no | 🚫 no |
oo |
U+2022 • |
✅ yes | 🚫 no |
,. |
U+2026 … |
✅ yes | 🚫 no |
4' |
U+2057 ⁗ |
✅ yes | ✅ yes |
=e |
U+20AC € |
🚫 no | ✅ yes |
Eu |
U+20AC € |
✅ yes | ✅ yes |
=P |
U+20BD ₽ |
✅ yes | ✅ yes |
=R |
U+20BD ₽ |
✅ yes | ✅ yes |
.= |
U+2250 ≐ |
✅ yes | 🤷 |
How do you implement digraphs in the wild?
(Really stretching the concept of relevance here.)
I have a problem where I will have tiny conveniences on some
computers that I own and then get annoyed that I don’t have them on
other devices or operating systems. I’m also enough of a web dev that I
can fix this. Digraphs are easy enough: In a text area, capture the
keydown event. If it’s Ctrl-K, activate digraph mode and
capture and preventDefault
the next two keydown events.
After the second such event, look up the digraph and append it.
.addEventListener('keydown', function(e) {
textareaif (e.ctrlKey && e.key === 'k') {
.preventDefault();
e= true;
isDigraphMode // ...
}if (isDigraphMode) {
// look some stuff up with previous/current e.key
}; })
Problem: I can’t Ctrl-K on my phone. Solution: Pick a different typeable key, say the backslash. (In practice I made it customizable.)
.addEventListener('keydown', function(e) {
textareaif (e.ctrlKey && e.key === 'k'
|| (!isDigraphMode && e.key === "\\")) {
.preventDefault();
e= true;
isDigraphMode // ...
}if (isDigraphMode) {
// look some stuff up with e.key and any buffered presses
}; })
Surprisingly, although this works on my computer and a few other
desktop and mobile browsers I checked, it doesn’t work on Android
Chrome, where e.key
is always "Unidentified"
.
Some StackOverflow issues like Why
is “Unidentified” returned on keyboard input on mobile? point to issue
118639 on the Chromium bug tracker, but it denies access for me.
According to a Wayback
Machine snapshot, this bug was opened in March 2012 and a fix was
merged in August 2014, but then the fix was reverted in April 2015 and
the bug was marked as WontFix.
Too many non-western-keyboard use-cases depend on having “compose” key-codes instead of guessed “proper” ones. We’ve tried fixes that compromise but they never get all the edge-cases.
The bottom line is that if you want your keyboard-knowledgeable web page to work with Chrome, you’re going to have to be tolerant of IME’s idiosyncrasies with regard to key-codes.
The best resolution I found is to capture the beforeinput
event instead, whose meaning maybe is simpler in a world with so
many input methods — “if you would insert this text, don’t”.
.addEventListener('beforeinput', function(e) {
textareaif (e.data === "\\") {
.preventDefault();
e= true;
isDigraphMode // ...
}; })
Anyway I now have a digraphs.html with an obnoxious number of custom digraphs I can use anywhere I have an internet connection and a modern JavaScript-enabled browser.