College Emails

(Frivolous blog content, posted as part of a daily posting streak I have openly committed to; standard disclaimers apply)

Out of boredom and curiosity, I graphed how many emails colleges sent me, excluding the colleges I actually applied to. I am being extremely polite and just calling them emails. I’ve wanted to make this for a long time, but it wasn’t until I saw this post about an email experiment on waxy.org/links that I understood which tools I could use to quantify my emails. (And then I actually made it and procrastinated posting it here for two months. If you look at my GitHub page or activity you might have seen it already, though. Oops.)

I don’t think the results were expected. Other than saying that, I leave the interpretation up to the reader because I’m on a tight blogging schedule. Cool? Cool.

Step-by-step instructions:

Step 1. Get offlineimap. If you’re on a Mac like me and you have homebrew (highly recommended), it’s available as brew install offlineimap. Otherwise you’re on your own.

Step 2. Create a suitable .offlineimaprc. If you are a Gmail user like me, know that Gmail is so popular that offlineimap has a custom type for it:

[general]
accounts = Gmail

[Account Gmail]
localrepository = Local
remoterepository = Remote

[Repository Local]
type = Maildir
localfolders = ~/email

[Repository Remote]
type = Gmail
remoteuser = [email protected]
sslcacertfile = /usr/local/etc/openssl/osx_cert.pem
folderfilter = lambda x: x == 'foobar'

The names “Gmail, Local, Remote” in the first three parts can be chosen as you like. But the type = Gmail in the description of the remote repository must be exact, with capitalization.

Put a directory path you like under localfolders for the local repository. Change your remoteuser to be your full Gmail account, of course.

Then you need a sslcacertfile, a certificate file or something to secure your connection. I followed this StackOverflow answer blindly without really understanding what was happening.

If you just want emails under certain tags or folders, write a Python lambda under folderfilter that tests the tag string.

You may also need to enable less secure apps on your Gmail account to let it allow offlineimap. You can change it back later once offlineimap is done syncing. To sync, just put offlineimap.

Step 3. Get mu. Also available on Homebrew. Run mu index --maildir ~/email

Step 4. Get the data out of mu and process it into a usable form. In my case, I mimicked the Medium post and composed a command-line one-liner to extract the domains from which the emails were sent and turn them into usable comma-separated data that can be thrown into things such as this random d3.js bar chart generator I found online: mu find maildir:/collegespam --fields f | perl -ne '/<.*?@(.*?)>/; print "$1\n";' | sort | uniq -c | awk '{ print $2 "," $1 }'

Step 5. Label. I manually labeled the domain names with their colleges where it wasn’t obvious and sprinkled lots of other annotations onto the list because I was too lazy to automate this. I also added the ordinal scale coloring by top-level domain by myself.

Step 6. Avoid procrastination. Whoops. I procrastinated posting this, so I had to manually adjust the numbers while preserving the labels several times; I regret this. So don’t procrastinate, kids!

(note: the commenting setup here is experimental and I may not check my comments often; if you want to tell me something instead of the world, email me!)