Haskell and Primes

“I have been told that any encryption becomes safer if the underlying algorithm is maximally obscured, what is most conveniently done by coding it in Haskell.” – rankk

Functional programming is terribly addicting! Partly I think the completely different way of thinking makes it feel like learning programming, and falling in love with it, all over again. Partly there’s this evil sense of satisfaction from using $s (and later <$>s and =<<s and &&&s) to improve readability for initiated Haskellers and worsen it for everybody else. Partly it’s because Learn You a Haskell for Great Good! is such a fun read — there are too many funny bits to list but my favorite so far is when the author analyzes the first verse of Avril Lavigne’s Girlfriend.

Although I think my code in Haskell tends to be more readable than in other languages, code obfuscation in Haskell is almost natural: all you have to do is refactor the wrong function to be “pointfree”, which means that even though it’s a function that takes arguments, you define it without parameters by manipulating and joining a bunch of other functions. Example (plus a few other tiny obfuscations):

isPrime = liftA2 (&&) (liftA2 ($) (all . ((.) (0 /=)) . rem) (flip
    takeWhile [2..] . (flip (.) $ liftA2 (*) id id) . (>=))) ((<) 1)

QQ wordpress why no Haskell highlighting (Editor’s note from 2017: The migration should highlight this now!)

Also, for some reason, you can do this in Haskell:

ghci> let 2 + 2 = 5 in 2 + 2
5

(via Haskell for the Evil Genius)

Okay, but seriously now. I wrote this about my journey to learn functional programming in the programming babble post half a year ago:

The main obstacle I have is that it’s hard to optimize or get asymptotics when computation is expensive (a big problem if you’re trying to learn through Project Euler problems, particularly ones with lots of primes).

Anyway, while browsing r/haskell, I learned from this article what I was doing wrong; it’s entirely possible to generate primes functionally and quickly. This article on functional Sieve of Eratosthenes explains why the “naive Sieve of Eratosthenes”, viz.,

primes = sieve [2..]
sieve (p : xs) = p : sieve [x | x <- xs, x `mod` p > 0]

(copied from the article; it’s simpler than my own naive attempt) is severely slow, in fact with time complexity $\Theta(n^2 / (\log n)^2)$. Even trial division with a suitable cut-off is faster.

The reason the real Sieve is fast is that it’s easy to “cross off” the multiples of a prime. Here’s my naive Python implementation; it can definitely be optimized, but since I’ve solved quite a few Project Euler problems with it already, it’s “fast enough”.

def precomputePrimes(limit):
    """Generator of primes up to, but not including, limit. Kind of slow."""
    arr = [True] * limit
    arr[0] = False
    arr[1] = False
    for p in range(limit):
        if arr[p]:
            yield p
            for i in range(0, limit, p):
                arr[i] = False

The Python code directly loops through the multiples of each prime p, only touching roughly 1/p of the numbers. In contrast, in the naive Haskell code, the computer tries to divide every single number still in the sieve by every prime. We don’t need that; from the point of view of the prime, the numbers that divide it are easy to compute, and this ease is exactly why the Sieve of Eratosthenes is fast.

The article contains a fast Haskell implementation that reaches $\Theta(n \log n \log \log n)$, which you can check out if you want one, but to preserve the spirit of Project Euler, here’s a description without any code if you want to implement this yourself (it’s not that hard). You still keep track of the sieve as a list and sieve through it recursively, but you “cross off” prime multiples lazily, or “just-in-time”; you remember the primes in a way that you can tell, without any trial division, when you reach the next number that needs to be crossed off by a prime.