Category → math

Signed Modulo

One thing many mathematically-inclined programmers run into when implementing mathematical algorithms, particularly number-theoretic ones, is that the modulo operation doesn’t behave how they expect or prefer.

In many languages, this operator is denoted %. Concretely, one might prefer that, if the second argument is positive, then the modulo operation would always give a nonnegative result. Under this behavior, the expression (-5) % 3 would evaluate to 1 rather than -2. This is a lot more useful for number theory because then for positive integers n, the % n operation actually maps integers to exactly the n canonical representatives for the residue classes. As a result, \(a \equiv b \mod n\) if and only if a % n == b % n. You can also do things like index into a length-n array with a % n and know that the index will be in-bounds. Finally, there are also optimization opportunities: modding by a power of 2 becomes equivalent to a simple bitwise AND, which is really fast on modern computers.

A few programming languages, notably Python, do implement % this way. However, the majority of languages today, including pretty much everything remotely descended from C, do not; instead, (-5) % 3 is -2. This post attempts to track down why.

The first thing to note is that there is a more important number-theoretic identity we’d like to have:

\[\texttt{a} = (\texttt{a / b}) \cdot \texttt{b} + (\texttt{a \% b}). \tag{1}\]

In words, the integer division and modulo operators should give a breakdown of a into a sum of some copies of b plus a remainder. Note that this equation also implies that specifying the rounding behavior of division is equivalent to specifying the sign behavior of the modulo operation, which will come up repeatedly later.

It’s also very uncontroversial that that remainder should have no copies of b, positive or negative, left over, which gives the constraint:

\[|\texttt{a \% b}| < |\texttt{b}|. \tag{2}\]

Every programming language I can think of satisfies these two constraints.1 So far so good. However, these two constraints don’t uniquely determine the values of a % b when a isn’t divisible by b; there are two possible values for a % b, one positive and one negative. Concretely, we could express \(-5\) as either \((-1) \cdot 3 + (-2)\) or \((-2) \cdot 3 + 1\), so (-5) % 3 could be -2 or 1.

It’s still mostly uncontroversial that, if a and b are both positive, then a % b should be nonnegative as well; we could call this constraint (3).2 However, if a is negative and b is positive, programming languages start to diverge in their behavior. Why?

Rocket Equation

Advent of Code 2019, Day 1

It’s December, so it’s time for a lot of things, including Advent of Code. I will not be able to be as competitive as I was last year, and already lost a lot of points to a really silly mistake on day 1, but I’ll be playing when I can and golfing the problems when I have time (so far: 7 + 14 bytes).

As one might expect, Day 1 is not too complex, but the second part can be analyzed to some mathematical depth and was discussed a bit on Reddit; plus, it occurred to me recently that I set up KaTeX on my blog but never used it, so I was looking for an excuse to write some equations anyway.

The problem statement for part 2, in brief: We are tasked with calculating the total mass of fuel required to launch a rocket module of a given mass. For something of mass \(m\), one can compute the directly required mass of fuel by dividing \(m\) by 3, rounding down, and subtracting 2; if the result is negative, it is taken to be 0 instead. However, the directly required fuel also requires fuel itself, calculated from its own mass by the same procedure, and that required fuel requires fuel based on its own mass, and so on until you reach fuel with 0 requirement.

[CIMC 2015 Part 2] Journey of the Blue-White Slippers

(Nontopical life update: Current 18.06 homework status: 34% (mildly screwed, probably won’t finish before I leave my cozy home for the U.S. and I usually struggle to get into the mood for homework while traveling, but I guess I’ll have to)) [18.06 status panel: 34%] (I’ve been spending most of my uptime doing said homework and running errands, and my downtime catching up on Last Week Tonight with John Oliver while farming the Flight Rising Coliseum. And, okay, making the above status panel. Live version here courtesy of Dropbox’s Public folder. No regrets.)

Day 3 (Excursions)

Morning routine snipped. We come to the middle school again to eat breakfast and gather; the contestants will be taking their tests here (accompanied by one bottle of “Buff” energy drink each) while the rest of us will be going on an excursion. Before this happens, though, two Taiwanese contestants ask me and Hsin-Po some math problems. There’s a geometry problem, which I fail to solve:

(paraphrased) In triangle △ABC, ∠A is 40° and ∠B is 60°. The angle bisector of ∠A meets BC at D; E is on AB such that ∠ADE is 30°. Find ∠DEC.

Hsin-Po figures out that, once you guess (ROT13) gur bgure boivbhf privna vf nyfb na natyr ovfrpgbe naq gurl vagrefrpg ng gur vapragre, lbh pna cebir vg ol pbafgehpgvat gur vapragre naq fubjvat sebz gur tvira natyr gung gurl vaqrrq pbvapvqr.1 Then, there’s a combinatorics problem in a book with a solution that they’re not sure about:



I always tell myself, okay, I will actually just draw something facetiously and get it over with, nobody comes to this blog to admire my GIMP mouse doodles, but then perfectionist tendencies kick in and I get carried away and it ends up taking more than an hour or so.



original sillier post

Note on notation: I’m going to use \(\text{Stab}(x)\) instead of \(G_x\) for the stabilizer subgroup and \(\text{Cl}(x)\) instead of \(^Gx\) for the conjugacy classes. For the orbit of \(x\) I’ll stick with the norm and use \(Gx\), although it’s only used in confusing summation notation that I’ll explain with words too.

We keep using this silly counting argument which I thought was something like Burnside’s lemma but actually is a lot simpler, just partitioning the set into orbits and slapping the orbit-stabilizer theorem on.

If \(G\) is the group and \(S\) is the set then


original sillier post

Note on notation: to be maximally clear, I have bolded all my vectors and put tiny arrows on them. Normal letters are usually reals, uppercase letters are usually bigger matrices. Also, \(\cdot^T\) denotes the transpose of a matrix.

Let \(\vec{\textbf{v}}_1, \ldots, \vec{\textbf{v}}_m\) be elements of \(\mathbb{R}^n\) where \(m \leq n\), i.e. column vectors with \(n\) real elements. Let \(V = [\vec{\textbf{v}}_1, \ldots, \vec{\textbf{v}}_m]\). This means pasting the column vectors together to make an \(n \times m\) matrix (\(n\) rows \(m\) columns).

Consider the thing \(\vec{\textbf{v}}_1 \wedge \vec{\textbf{v}}_2 \wedge \cdots \wedge \vec{\textbf{v}}_m\), which can be visualized as the hyperparallelogram \(\left\{\sum_{i=1}^{m} t_i\vec{\textbf{v}}_i \,\middle\vert\, t_i \in [0,1], i = 1, 2, \ldots, m \right\}\) but is apparently a different thing in a different vector space of things. We wonder how to compute the hyperarea of this hyperparallelogram.

Extreme Value Theorem

original sillier post

The “Extreme Value Theorem”, according to my old calculus textbook (Larson, Hostetler, Edwards, 8th ed.), is “A continuous function defined on a closed interval attains a minimum and a maximum on the interval.” The calculus textbook continues,

Although [it] is intuitively plausible, a proof of this theorem is not within the scope of this text.

Of course Rudin proves this, but coming from an unrigorous calculus background, the required deductions span three chapters and are very hard to motivate. That’s probably because it proves things for extremely general objects.

In particular, I have no idea why the definition of “compact” is what it is: “a set for which every open cover has a finite subcovering”. Therefore, here’s a less general but more motivated proof, from the grounds-up.


Real numbers are hard to define. We only need to know that we can add, subtract, multiply, divide, and compare them, i.e. they’re an ordered field, and that they have the least-upper-bound property. That is, if we have a set of real numbers that are bounded above, then the set has a least upper bound, a bound that “perfectly fits” the set. Precisely, the bound must be such that every element of the set is less than or equal to it, and no smaller value satisfies the same property.


Continuing the porting of stuff from betaveros.stash, and adding more stuff. Mnemonic Here’s my mnemonic table for digits, inspired by an old Martin Gardner column. I wrote from memory the first 132 digits of 2012! correctly at IMO 2012 with this table. I had remembered more, but unfortunately, if I recall correctly, I confused myself over whether I had encoded a 5 or a 2 by the S of “nose”, because this is supposed to be more of a phonetic code than a spelling one — otherwise double letters would be confusing and lots of randomly appearing digraphs would be wasted, because English is weird.

Haskell and Primes

“I have been told that any encryption becomes safer if the underlying algorithm is maximally obscured, what is most conveniently done by coding it in Haskell.” – rankk

Functional programming is terribly addicting! Partly I think the completely different way of thinking makes it feel like learning programming, and falling in love with it, all over again. Partly there’s this evil sense of satisfaction from using $s (and later <$>s and =<<s and &&&s) to improve readability for initiated Haskellers and worsen it for everybody else. Partly it’s because Learn You a Haskell for Great Good! is such a fun read — there are too many funny bits to list but my favorite so far is when the author analyzes the first verse of Avril Lavigne’s Girlfriend.

Although I think my code in Haskell tends to be more readable than in other languages, code obfuscation in Haskell is almost natural: all you have to do is refactor the wrong function to be “pointfree”, which means that even though it’s a function that takes arguments, you define it without parameters by manipulating and joining a bunch of other functions. Example (plus a few other tiny obfuscations):

QQ wordpress why no Haskell highlighting (Editor’s note from 2017: The migration should highlight this now!)

Also, for some reason, you can do this in Haskell:

(via Haskell for the Evil Genius)

Okay, but seriously now. I wrote this about my journey to learn functional programming in the programming babble post half a year ago:

The main obstacle I have is that it’s hard to optimize or get asymptotics when computation is expensive (a big problem if you’re trying to learn through Project Euler problems, particularly ones with lots of primes).


This is beautiful. Why do they have to make it sound all mysterious and difficult? That’s (the reciprocal of) the golden ratio, by the way. Transcript since the resolution is far from awesome: “Most angiosperms have alternate phyllotaxy, with leaves arranged in an ascending spiral around the stem, each successive leaf emerging 137.5° from the site of the previous one. Why 137.5°? Mathematical analyses suggest that this angle minimizes shading of the lower leaves by those above.