A famous Trail of Bits post says to stop using RSA: it’s simple enough to make people think they can implement it correctly, but full of invisible traps. In contrast, although elliptic curve cryptography (ECC) implementations could also fall into subtle traps, they usually don’t. The post conjectures that this is because ECC intimidates people into using cryptographically sound libraries instead of implementing their own.
If you do want to understand ECC just enough to produce your own trap-ridden implementation, though, this post aims to get you there. (Assuming some mathematical background chosen in a not particularly principled way, approximately what I had before writing this post.) Hence the title. Because I have a cryptographic conscience, I will still point out the traps I know of; but there are probably traps I don’t know about.
This post contains a lot of handwaving and straight-up giving up on proofs. You don’t need to know the proofs to be dangerous. The ethos is sort of like Napkin, but way shoddier and with zero pretense of pedagogical soundness. Still, here are some concrete questions that I learned the answer to while writing this post:
- Why does the group law hold, sort of intuitively?
- Why do people have to modify Curve25519 before using it to compute digital signatures?
- What is a “quadratic twist” and why should I care about it to pick a secure elliptic curve?
- How is it possible that an isogeny can be described as surjective but not injective while mapping a finite elliptic curve to another elliptic curve of the same cardinality?
- How many claws does an alligator have?
Elliptic curves have a lot of complicated algebra. If you ever studied algebra in high school and did exercises where you had to simplify or factor or graph some really complicated algebraic expression, and you learned that algebra is also the name of a field in higher mathematics, you might have assumed that working algebraists just dealt with even more complicated expressions. If you then studied algebra in college, you’d probably have realized that that’s not really what algebra is about at all; the difficulty comes from new abstractions, like a bunch of the terms above.
Well… the study of elliptic curves involves a bunch of complicated expressions like what your high school self might have imagined. Sometimes, notes will just explode into a trainwreck of terms like
\[\begin{align*}\psi_1 &= 1 \\ \psi_2 &= 2y \\ \psi_3 &= 3x^4 + 6Ax^2 + 12Bx - A^2 \\ \psi_4 &= 4y(x^6 + 5Ax^4 + 20Bx^3 - 5A^2x^2 - 4ABx - A^3 - 8B^2).\end{align*}\]
“This is REAL Math, done by REAL Mathematicians,” one is tempted to quip. The Explicit-Formulas Database is a fun place to take a gander through. I will copy formulas into this post from time to time when there’s something about them I want to call attention to, but in general we won’t do any complicated algebraic manipulations in this post. Just be prepared.
Because I’m focusing on conceptual understanding (and am lazy), this post contains almost no code, and definitely no code that’s runnable in any real programming language.
One thing many mathematically-inclined programmers run into when implementing mathematical algorithms, particularly number-theoretic ones, is that the modulo operation doesn’t behave how they expect or prefer.
In many languages, this operator is denoted %
. Concretely, one might prefer that, if the second argument is positive, then the modulo operation would always give a nonnegative result. Under this behavior, the expression (-5) % 3
would evaluate to 1
rather than -2
. This is a lot more useful for number theory because then for positive integers n
, the % n
operation actually maps integers to exactly the n
canonical representatives for the residue classes. As a result, \(a \equiv b \mod n\) if and only if a % n == b % n
. You can also do things like index into a length-n
array with a % n
and know that the index will be in-bounds. Finally, there are also optimization opportunities: modding by a power of 2 becomes equivalent to a simple bitwise AND, which is really fast on modern computers.
A few programming languages, notably Python, do implement %
this way. However, the majority of languages today, including pretty much everything remotely descended from C, do not; instead, (-5) % 3
is -2
. This post attempts to track down why.
The first thing to note is that there is a more important number-theoretic identity we’d like to have:
\[\texttt{a} = (\texttt{a / b}) \cdot \texttt{b} + (\texttt{a \% b}). \tag{1}\]
In words, the integer division and modulo operators should give a breakdown of a
into a sum of some copies of b
plus a remainder. Note that this equation also implies that specifying the rounding behavior of division is equivalent to specifying the sign behavior of the modulo operation, which will come up repeatedly later.
It’s also very uncontroversial that that remainder should have no copies of b
, positive or negative, left over, which gives the constraint:
\[|\texttt{a \% b}| < |\texttt{b}|. \tag{2}\]
Every programming language I can think of satisfies these two constraints. So far so good. However, these two constraints don’t uniquely determine the values of a % b
when a
isn’t divisible by b
; there are two possible values for a % b
, one positive and one negative. Concretely, we could express \(-5\) as either \((-1) \cdot 3 + (-2)\) or \((-2) \cdot 3 + 1\), so (-5) % 3
could be -2
or 1
.
It’s still mostly uncontroversial that, if a
and b
are both positive, then a % b
should be nonnegative as well; we could call this constraint (3). However, if a
is negative and b
is positive, programming languages start to diverge in their behavior. Why?
It’s December, so it’s time for a lot of things, including Advent of Code. I will not be able to be as competitive as I was last year, and already lost a lot of points to a really silly mistake on day 1, but I’ll be playing when I can and golfing the problems when I have time (so far: 7 + 14 bytes).
As one might expect, Day 1 is not too complex, but the second part can be analyzed to some mathematical depth and was discussed a bit on Reddit; plus, it occurred to me recently that I set up KaTeX on my blog but never used it, so I was looking for an excuse to write some equations anyway.
The problem statement for part 2, in brief: We are tasked with calculating the total mass of fuel required to launch a rocket module of a given mass. For something of mass \(m\), one can compute the directly required mass of fuel by dividing \(m\) by 3, rounding down, and subtracting 2; if the result is negative, it is taken to be 0 instead. However, the directly required fuel also requires fuel itself, calculated from its own mass by the same procedure, and that required fuel requires fuel based on its own mass, and so on until you reach fuel with 0 requirement.
(Nontopical life update: Current 18.06 homework status: 34% (mildly screwed, probably won’t finish before I leave my cozy home for the U.S. and I usually struggle to get into the mood for homework while traveling, but I guess I’ll have to)) (I’ve been spending most of my uptime doing said homework and running errands, and my downtime catching up on Last Week Tonight with John Oliver while farming the Flight Rising Coliseum. And, okay, making the above status panel. Live version here courtesy of Dropbox’s Public folder. No regrets.)
Day 3 (Excursions)
Morning routine snipped. We come to the middle school again to eat breakfast and gather; the contestants will be taking their tests here (accompanied by one bottle of “Buff” energy drink each) while the rest of us will be going on an excursion. Before this happens, though, two Taiwanese contestants ask me and Hsin-Po some math problems. There’s a geometry problem, which I fail to solve:
(paraphrased) In triangle △ABC, ∠A is 40° and ∠B is 60°. The angle bisector of ∠A meets BC at D; E is on AB such that ∠ADE is 30°. Find ∠DEC.
Hsin-Po figures out that, once you guess (ROT13) gur bgure boivbhf privna vf nyfb na natyr ovfrpgbe naq gurl vagrefrpg ng gur vapragre, lbh pna cebir vg ol pbafgehpgvat gur vapragre naq fubjvat sebz gur tvira natyr gung gurl vaqrrq pbvapvqr. Then, there’s a combinatorics problem in a book with a solution that they’re not sure about:
(streak)
I always tell myself, okay, I will actually just draw something facetiously and get it over with, nobody comes to this blog to admire my GIMP mouse doodles, but then perfectionist tendencies kick in and I get carried away and it ends up taking more than an hour or so.
original sillier post
Note on notation: I’m going to use \(\text{Stab}(x)\) instead of \(G_x\) for the stabilizer subgroup and \(\text{Cl}(x)\) instead of \(^Gx\) for the conjugacy classes. For the orbit of \(x\) I’ll stick with the norm and use \(Gx\), although it’s only used in confusing summation notation that I’ll explain with words too.
We keep using this silly counting argument which I thought was something like Burnside’s lemma but actually is a lot simpler, just partitioning the set into orbits and slapping the orbit-stabilizer theorem on.
If \(G\) is the group and \(S\) is the set then
original sillier post
Note on notation: to be maximally clear, I have bolded all my vectors and put tiny arrows on them. Normal letters are usually reals, uppercase letters are usually bigger matrices. Also, \(\cdot^T\) denotes the transpose of a matrix.
Let \(\vec{\textbf{v}}_1, \ldots, \vec{\textbf{v}}_m\) be elements of \(\mathbb{R}^n\) where \(m \leq n\), i.e. column vectors with \(n\) real elements. Let \(V = [\vec{\textbf{v}}_1, \ldots, \vec{\textbf{v}}_m]\). This means pasting the column vectors together to make an \(n \times m\) matrix (\(n\) rows \(m\) columns).
Consider the thing \(\vec{\textbf{v}}_1 \wedge \vec{\textbf{v}}_2 \wedge \cdots \wedge \vec{\textbf{v}}_m\), which can be visualized as the hyperparallelogram \(\left\{\sum_{i=1}^{m} t_i\vec{\textbf{v}}_i \,\middle\vert\, t_i \in [0,1], i = 1, 2, \ldots, m \right\}\) but is apparently a different thing in a different vector space of things. We wonder how to compute the hyperarea of this hyperparallelogram.
original sillier post
The “Extreme Value Theorem”, according to my old calculus textbook (Larson, Hostetler, Edwards, 8th ed.), is “A continuous function defined on a closed interval attains a minimum and a maximum on the interval.” The calculus textbook continues,
Although [it] is intuitively plausible, a proof of this theorem is not within the scope of this text.
Of course Rudin proves this, but coming from an unrigorous calculus background, the required deductions span three chapters and are very hard to motivate. That’s probably because it proves things for extremely general objects.
In particular, I have no idea why the definition of “compact” is what it is: “a set for which every open cover has a finite subcovering”. Therefore, here’s a less general but more motivated proof, from the grounds-up.
Definitions
Real numbers are hard to define. We only need to know that we can add, subtract, multiply, divide, and compare them, i.e. they’re an ordered field, and that they have the least-upper-bound property. That is, if we have a set of real numbers that are bounded above, then the set has a least upper bound, a bound that “perfectly fits” the set. Precisely, the bound must be such that every element of the set is less than or equal to it, and no smaller value satisfies the same property.
Continuing the porting of stuff from betaveros.stash, and adding more stuff.
Mnemonic Here’s my mnemonic table for digits, inspired by an old Martin Gardner column. I wrote from memory the first 132 digits of 2012! correctly at IMO 2012 with this table. I had remembered more, but unfortunately, if I recall correctly, I confused myself over whether I had encoded a 5 or a 2 by the S of “nose”, because this is supposed to be more of a phonetic code than a spelling one — otherwise double letters would be confusing and lots of randomly appearing digraphs would be wasted, because English is weird.
“I have been told that any encryption becomes safer if the underlying algorithm is maximally obscured, what is most conveniently done by coding it in Haskell.” – rankk
Functional programming is terribly addicting! Partly I think the completely different way of thinking makes it feel like learning programming, and falling in love with it, all over again. Partly there’s this evil sense of satisfaction from using $
s (and later <$>
s and =<<
s and &&&
s) to improve readability for initiated Haskellers and worsen it for everybody else. Partly it’s because Learn You a Haskell for Great Good! is such a fun read — there are too many funny bits to list but my favorite so far is when the author analyzes the first verse of Avril Lavigne’s Girlfriend.
Although I think my code in Haskell tends to be more readable than in other languages, code obfuscation in Haskell is almost natural: all you have to do is refactor the wrong function to be “pointfree”, which means that even though it’s a function that takes arguments, you define it without parameters by manipulating and joining a bunch of other functions. Example (plus a few other tiny obfuscations):
QQ wordpress why no Haskell highlighting (Editor’s note from 2017: The migration should highlight this now!)
Also, for some reason, you can do this in Haskell:
(via Haskell for the Evil Genius)
Okay, but seriously now. I wrote this about my journey to learn functional programming in the programming babble post half a year ago:
The main obstacle I have is that it’s hard to optimize or get asymptotics when computation is expensive (a big problem if you’re trying to learn through Project Euler problems, particularly ones with lots of primes).