A famous Trail of Bits post says to stop using
RSA: it’s simple enough to make people think they can implement it
correctly, but full of invisible traps. In contrast, although elliptic
curve cryptography (ECC) implementations could also fall into subtle
traps, they usually don’t. The post conjectures that this is because ECC
intimidates people into using cryptographically sound libraries instead
of implementing their own.
If you do want to understand ECC just enough to produce your own
trap-ridden implementation, though, this post aims to get you there.
(Assuming some mathematical background chosen in a not particularly
principled way, approximately what I had before writing this post.)
Hence the title. Because I have a cryptographic conscience, I will still
point out the traps I know of; but there are probably traps I don’t know
about.
This post contains a lot of handwaving and straight-up giving up on
proofs. You don’t need to know the proofs to be dangerous. The ethos is
sort of like Napkin,
but way shoddier and with zero pretense of pedagogical soundness. Still,
here are some concrete questions that I learned the answer to while
writing this post:
- Why does the group law hold, sort of intuitively?
- Why do people have to modify Curve25519 before using it to compute
digital signatures?
- What is a “quadratic twist” and why should I care about it to pick a
secure elliptic curve?
- How is it possible that an isogeny can be described as surjective
but not injective while mapping a finite elliptic curve to another
elliptic curve of the same cardinality?
- How many claws does an alligator have?
Elliptic curves have a lot of complicated algebra. If you ever
studied algebra in high school and did exercises where you had to
simplify or factor or graph some really complicated algebraic
expression, and you learned that algebra is also the name of a field in
higher mathematics, you might have assumed that working algebraists just
dealt with even more complicated expressions. If you then studied
algebra in college, you’d probably have realized that that’s not really
what algebra is about at all; the difficulty comes from new
abstractions, like a bunch of the terms above.
Well… the study of elliptic curves involves a bunch of complicated
expressions like what your high school self might have imagined.
Sometimes, notes will just explode into a trainwreck of terms like
\[\begin{align*}\psi_1 &= 1 \\ \psi_2
&= 2y \\ \psi_3 &= 3x^4 + 6Ax^2 + 12Bx - A^2 \\ \psi_4 &=
4y(x^6 + 5Ax^4 + 20Bx^3 - 5A^2x^2 - 4ABx - A^3 -
8B^2).\end{align*}\]
“This is REAL Math, done by REAL Mathematicians,” one is tempted to
quip. The Explicit-Formulas
Database is a fun place to take a gander through. I will copy
formulas into this post from time to time when there’s something about
them I want to call attention to, but in general we won’t do any
complicated algebraic manipulations in this post. Just be prepared.
Because I’m focusing on conceptual understanding (and am lazy), this
post contains almost no code, and definitely no code that’s runnable in
any real programming language.
One thing many mathematically-inclined programmers run into when
implementing mathematical algorithms, particularly number-theoretic
ones, is that the modulo
operation doesn’t behave how they expect or prefer.
In many languages, this operator is denoted %
.
Concretely, one might prefer that, if the second argument is positive,
then the modulo operation would always give a nonnegative result. Under
this behavior, the expression (-5) % 3
would evaluate to
1
rather than -2
. This is a lot more useful
for number theory because then for positive integers n
, the
% n
operation actually maps integers to exactly the
n
canonical representatives for the residue classes. As a
result, \(a \equiv b \mod n\) if and
only if a % n == b % n
. You can also do things like index
into a length-n
array with a % n
and know that
the index will be in-bounds. Finally, there are also optimization
opportunities: modding by a power of 2 becomes equivalent to a simple
bitwise AND, which is really fast on modern computers.
A few programming languages, notably Python, do implement
%
this way. However, the majority of languages today,
including pretty much everything remotely descended from C, do not;
instead, (-5) % 3
is -2
. This post attempts to
track down why.
The first thing to note is that there is a more important
number-theoretic identity we’d like to have:
\[\texttt{a} = (\texttt{a / b}) \cdot
\texttt{b} + (\texttt{a \% b}). \tag{1}\]
In words, the integer division and modulo operators should give a
breakdown of a
into a sum of some copies of b
plus a remainder. Note that this equation also implies that specifying
the rounding behavior of division is equivalent to specifying the sign
behavior of the modulo operation, which will come up repeatedly
later.
It’s also very uncontroversial that that remainder should have no
copies of b
, positive or negative, left over, which gives
the constraint:
\[|\texttt{a \% b}| < |\texttt{b}|.
\tag{2}\]
Every programming language I can think of satisfies these two
constraints. So far so good. However, these two
constraints don’t uniquely determine the values of a % b
when a
isn’t divisible by b
; there are two
possible values for a % b
, one positive and one negative.
Concretely, we could express \(-5\) as
either \((-1) \cdot 3 + (-2)\) or \((-2) \cdot 3 + 1\), so
(-5) % 3
could be -2
or 1
.
It’s still mostly uncontroversial that, if a
and
b
are both positive, then a % b
should be
nonnegative as well; we could call this constraint (3).
However, if a
is negative and b
is positive,
programming languages start to diverge in their behavior. Why?
It’s December, so it’s time for a lot of things, including Advent of Code. I will not be able
to be as competitive as I was last year, and already lost a lot of
points to a really silly mistake on day 1, but I’ll be playing when I
can and golfing
the problems when I have time (so far: 7 + 14 bytes).
As one might expect, Day 1 is not too complex,
but the second part can be analyzed to some mathematical depth and was
discussed a bit on Reddit; plus, it occurred to me recently that I set
up KaTeX on my blog but never used it, so I was looking for an excuse to
write some equations anyway.
The problem statement for part 2, in brief: We are tasked with
calculating the total mass of fuel required to launch a rocket module of
a given mass. For something of mass \(m\), one can compute the directly required
mass of fuel by dividing \(m\) by 3,
rounding down, and subtracting 2; if the result is negative, it is taken
to be 0 instead. However, the directly required fuel also requires fuel
itself, calculated from its own mass by the same procedure, and that
required fuel requires fuel based on its own mass, and so on until you
reach fuel with 0 requirement.
(Nontopical life update: Current 18.06 homework status: 34% (mildly
screwed, probably won’t finish before I leave my cozy home for the U.S.
and I usually struggle to get into the mood for homework while
traveling, but I guess I’ll have to))
(I’ve been spending most of my uptime doing said homework and running
errands, and my downtime catching up on Last Week Tonight with
John Oliver while farming the Flight Rising Coliseum. And, okay,
making the above status panel.
Live
version here courtesy of Dropbox’s Public folder. No regrets.)
Day 3 (Excursions)
Morning routine snipped. We come to the middle school again to eat
breakfast and gather; the contestants will be taking their tests here
(accompanied by one bottle of “Buff” energy drink each) while the rest
of us will be going on an excursion. Before this happens, though, two
Taiwanese contestants ask me and Hsin-Po some math problems. There’s a
geometry problem, which I fail to solve:
(paraphrased) In triangle △ABC, ∠A is 40° and ∠B is 60°. The angle
bisector of ∠A meets BC at D; E is on AB such that ∠ADE is 30°. Find
∠DEC.
Hsin-Po figures out that, once you guess (ROT13) gur bgure boivbhf
privna vf nyfb na natyr ovfrpgbe naq gurl vagrefrpg ng gur vapragre, lbh
pna cebir vg ol pbafgehpgvat gur vapragre naq fubjvat sebz gur tvira
natyr gung gurl vaqrrq pbvapvqr. Then, there’s a
combinatorics problem in a book with a solution that they’re not sure
about:
(streak)
I always tell myself, okay, I will actually just draw something
facetiously and get it over with, nobody comes to this blog to admire my
GIMP mouse doodles, but then perfectionist tendencies kick in and I get
carried away and it ends up
taking more than an hour or so.

original
sillier post
Note on notation: I’m going to use \(\text{Stab}(x)\) instead of \(G_x\) for the stabilizer subgroup and \(\text{Cl}(x)\) instead of \(^Gx\) for the conjugacy classes. For the
orbit of \(x\) I’ll stick with the norm
and use \(Gx\), although it’s only used
in confusing summation notation that I’ll explain with words too.
We keep using this silly counting argument which I thought was
something like Burnside’s lemma but actually is a lot simpler, just
partitioning the set into orbits and slapping the orbit-stabilizer
theorem on.
If \(G\) is the group and \(S\) is the set then
original
sillier post
Note on notation: to be maximally clear, I have bolded all my vectors
and put tiny arrows on them. Normal letters are usually reals, uppercase
letters are usually bigger matrices. Also, \(\cdot^T\) denotes the transpose of a
matrix.
Let \(\vec{\textbf{v}}_1, \ldots,
\vec{\textbf{v}}_m\) be elements of \(\mathbb{R}^n\) where \(m \leq n\), i.e. column vectors with \(n\) real elements. Let \(V = [\vec{\textbf{v}}_1, \ldots,
\vec{\textbf{v}}_m]\). This means pasting the column vectors
together to make an \(n \times m\)
matrix (\(n\) rows \(m\) columns).
Consider the thing \(\vec{\textbf{v}}_1
\wedge \vec{\textbf{v}}_2 \wedge \cdots \wedge
\vec{\textbf{v}}_m\), which can be visualized as the
hyperparallelogram \(\left\{\sum_{i=1}^{m}
t_i\vec{\textbf{v}}_i \,\middle\vert\, t_i \in [0,1], i = 1, 2, \ldots,
m \right\}\) but is apparently a different thing in a different
vector space of things. We wonder how to compute the hyperarea
of this hyperparallelogram.
original
sillier post
The “Extreme Value Theorem”, according to my old calculus textbook
(Larson, Hostetler, Edwards, 8th ed.), is “A continuous function defined
on a closed interval attains a minimum and a maximum on the interval.”
The calculus textbook continues,
Although [it] is intuitively plausible, a proof of this theorem is
not within the scope of this text.
Of course Rudin proves this, but coming from an unrigorous calculus
background, the required deductions span three chapters and are very
hard to motivate. That’s probably because it proves things for extremely
general objects.
In particular, I have no idea why the definition of “compact” is what
it is: “a set for which every open cover has a finite subcovering”.
Therefore, here’s a less general but more motivated proof, from the
grounds-up.
Definitions
Real numbers are hard to define. We only need to
know that we can add, subtract, multiply, divide, and compare them,
i.e. they’re an ordered field, and that they have the
least-upper-bound property. That is, if we have a set
of real numbers that are bounded above, then the set has a least
upper bound, a bound that “perfectly fits” the set. Precisely, the
bound must be such that every element of the set is less than or equal
to it, and no smaller value satisfies the same property.
Continuing the porting of stuff from betaveros.stash, and adding more stuff.
Mnemonic Here’s my mnemonic table for digits, inspired by an old Martin Gardner column. I wrote from memory the first 132 digits of 2012! correctly at IMO 2012 with this table. I had remembered more, but unfortunately, if I recall correctly, I confused myself over whether I had encoded a 5 or a 2 by the S of “nose”, because this is supposed to be more of a phonetic code than a spelling one — otherwise double letters would be confusing and lots of randomly appearing digraphs would be wasted, because English is weird.
“I have been told that any encryption becomes safer if the underlying
algorithm is maximally obscured, what is most conveniently done by
coding it in Haskell.” – rankk
Functional programming is terribly addicting! Partly I think the
completely different way of thinking makes it feel like learning
programming, and falling in love with it, all over again. Partly there’s
this evil sense of satisfaction from using $
s (and later
<$>
s and =<<
s and
&&&
s) to improve readability for initiated
Haskellers and worsen it for everybody else. Partly it’s because
Learn You a Haskell for Great
Good! is such a fun read — there are too many funny bits to list
but my favorite so far is when the author analyzes the first verse of
Avril Lavigne’s Girlfriend.
Although I think my code in Haskell tends to be more readable than in
other languages, code obfuscation in Haskell is almost natural: all you
have to do is refactor the wrong function to be “pointfree”, which means
that even though it’s a function that takes arguments, you define it
without parameters by manipulating and joining a bunch of other
functions. Example (plus a few other tiny obfuscations):
isPrime = liftA2 (&&) (liftA2 ($) (all . ((.) (0 /=)) . rem) (flip
takeWhile [2..] . (flip (.) $ liftA2 (*) id id) . (>=))) ((<) 1)
QQ wordpress why no Haskell highlighting (Editor’s note from
2017: The migration should highlight this now!)
Also, for some reason, you can do this in Haskell:
ghci> let 2 + 2 = 5 in 2 + 2
5
(via Haskell for the
Evil Genius)
Okay, but seriously now. I wrote this about my journey to learn
functional programming in the
programming babble post half a
year ago:
The main obstacle I have is that it’s hard to optimize or get
asymptotics when computation is expensive (a big problem if you’re
trying to learn through Project Euler problems, particularly ones with
lots of primes).