Tuesday, March 24, 2009

Prime Numbers 3. Wheel Factorization

In an earlier post, I described the sieve of Eratosthenes, a very fast algorithm to calculate all primes below a given limit, n. At least when programming it in python, speed is not an issue at all with this algorithm, because we start having memory problems much earlier, due to the fact that we need to keep in memory a list of size n...

This is specially problematic with python, because although all we need is a list of booleans (is prime / is not prime), which could fit in a bit each, I'm afraid python assigns several bytes to each. How many of them is probably machine dependent, but I'm almost certain it is at least 4 of them. So it is very likely that we could improve memory usage by a factor of 32, maybe more, if we could get around this. There are not-so-standard libraries that do just that, but I've sworn not to use them, and one day I may expose my failed attempts at using very long ints as bit arrays to the world, but python is definitely not the best language to mess around with low level optimizations, so I'm going to leave this line of thinking aside for the time being.

I also presented a very naive optimization, which simply discards all even numbers, which we already know are not prime, and thus cuts in half the memory requirements. More formally, what was being done there was to partition all numbers into two groups: those of the form 2·n, and those of the form 2·n+1, and then discard the former, since no prime of that form exists, and keep the latter only. This line of thinking is much more promising, as it leads to a very neat way of further reducing the memory required, known as wheel factorization.

For some odd reason that escapes my shallow understanding, the standard description of this algorithm places much emphasis on setting all numbers in circles, and then eliminating what it calls spokes. I personally don't see any benefit of such a mental image, specially since the number are going to end up arranged in a rectangular array. So for the sake of clarity, I will not honor tradition, and spare you the circles and the spokes.

We begin by choosing a first few prime numbers. Say the first three: 2, 3 and 5. We multiply them together to get M = 30. We are now going to partition the set of all numbers into M groups, of the form M·m + ki, where the ki are the numbers 1 through M, both inclusive. Most of these subsets happen to hold no prime number, and thus need not be considered at all. To find out which of them, remember that M is the product of several prime numbers, M = p1·p2 ... pn. Therefore, if ki is a multiple of any of those prime numbers, i.e. ki = pj·k'i, we can always extract the common factor pj from both summands in M·m + ki, which means that all numbers in that subset are composite. These subsets, which we discard, are very closely related to the spokes I was telling you about a while ago.

If M = 30, of the 30 potential ki we are left with just 8 (1, 7, 11, 13, 17, 19, 23 and 29), so we have cut memory required to about 27% of the straightforward implementation of the sieve of Eratosthenes. If we use the first 4 primes, we then have M = 210, and of the 210 potential ki we only need to keep 48. This improves memory use, which now is roughly 23% of the original. But you can probably see a law of diminishing returns at wortk here: the improvement in memory use gets smaller and smaller with every prime you add to the calculation of M. You can find some more data regarding this at the end of this link.

Of course we are still going to have to sieve each of our subsets, in much the same way we did with the sieve of Eratothenes. But sieving all multiples of a prime p when you have several subsets to consider is anything but trivial. Lets see how to tackle it...

The question to answer is, what integer values of m fulfill the equation M·m + ki = p·n? Using modular arithmetic, this equation is equivalent to finding M·m = -ki (mod p). This requires calculating M-1 (mod p), the modular multiplicative inverse of M modulo p, a subject already treated in a previous post. Given that, we can simply write mi = M-1·(-ki) (mod p), and therefore determine all multiples of p in the subset of ki as m = p·j + mi.

Last, but not least, we want to start sieving at the smallest multiple of p in each subset that is larger than or equal to p2, and so we want to increment mi with p·ji, where ji >= (p2-ki- M·mi) / (m·p).

Lets try to put all this ideas together in a python script. To make it work, you will need the primeListSofE function from this post, as well as the extEuclideanAlg and modInvEuclid ones from this other post...


def primeListWF(n, m, verbose = True) :
"""Returns a list of prime numbers smaller than or equal to n,
using wheel factorization, with a wheel size of the product of
all primes smaller than or equal to m
"""
t = time()
# We use the standard sieve of Eratosthenes first...
primes = primeListSofE(m)
M = 1
for p in primes :
M *= p
if verbose :
print "Size of factorization wheel is",M
# We now sieve out all multiples of these primes,including
# the primes themselves, to get the k[i]
sieve = [True] * M
for p in primes :
sieve[p-1] = False
for j in xrange(p*p,M+1,p) :
sieve[j-1] = False
k = [j+1 for j in range(M) if sieve[j]]
if verbose :
print "Memory use is only",len(k),"/",M,"=",len(k)/M
N = n
if N %M :
N += M
N //= M
maxP = int(sqrt(M*N))
if verbose:
print "Primes will be calculated up to",M*N,"not",n

sieve = [[True]*N for j in range(len(k))]
sieve[0][0] = False # Take care of the non primality of 1
for row in range(N) :
baseVal = M * row
for subset in range(len(k)) :
if sieve[subset][row] :
p = baseVal + k[subset]
primes += [p]
if p > maxP :
continue
# Sieve all multiples of p...
invMp = modInvEuclid(M,p)
for i in range(len(k)) :
mi = invMp * (-k[i] % p)
mi %= p
# ...starting at p**2
ji = (p*p - k[i] - M*mi)
if ji % (M*p) :
ji += M*p
ji //= M*p
mi+= ji*p

for r in range(mi,N,p) :
sieve[j][r] = False
if verbose :
print "Calculated primes to",n,"in",time()-t,"sec."

return primes


You can play around with this algorithm and look at the time required. It probably depends on the value of n, but there seems to be a sweet spot for speed somewhere around m = 7 (and thus M = 210) or m = 11 (M = 2310). This being a general purpose wheel factorization algorithm, it has speeds that are faster than the unoptimized sieve of Eratosthenes code, but it cannot beat the optimized code, which in fact is a particular case of wheel factorization, for m = M = 2.

On the other hand, using m = 7 I have managed to compute all primes up to 108, a feat unreachable for the previous code, at least without finding a workaround the MemoryError they produce. 109 is still out of reach, though...

No comments:

Post a Comment