Thursday, March 5, 2009

Prime Numbers 1. The Wrong Way

A prime number is a natural number larger than 1 which is only divisible by 1 and itself. Simple, isn't it? Euclid already proved quite a while ago that there are infinitely many of them, so lets try and write a function that returns all primes below a given number, using the most straightforward approach. Whta would that be? Well, there is nothing less sophisticated than try and divide each number by every integer larger than 1 and smaller than the number itself, and if we never get a zero remainder then the number is prime. Putting it in python:

from time import time

def primeList1(n, verbose = True) :
t = time()
ret = []
for j in range(2,n+1) :
jIsPrime = True
for k in range(2,j) :
if j % k == 0 :
jIsPrime = False
break
if jIsPrime :
ret += [j]
if verbose :
print "Calculated primes to",n,"in",time()-t,"sec."

return ret

Do notice the feeble attempt at code optimization, with the break statement that ends the loop as soon as a divisor is found... If you try it out, it will work fine. Use it to calculate all prime numbers below 1,000, and it will probably give you an instant answer. 10,000 delays a second or two, nothing too worrying. If you try it with 100,000, you'll have time to take your coffee break before it finishes, and adding any more zeroes will require leaving your script running overnight... Not too surprising really, since to determine that, for example, 9973 is indeed a prime number, our program is having to do trial division by every number between 2 and 9972, both inclusive. All in all, the time required grows with n2.

There are several very simple improvements that can speed things up dramatically, and which require a little, but just a little, mathematical insight...

First, we don't need to test every number between 2 and 9972 to determine that 9973 is prime; we can stop after the largest integer smaller than the square-root of 9973, i.e. 99. Why? Well, if a number is divisible by a number larger than its square-root, it will also be divisible by the quotient of that division, which is necesaarily smaller than the square-root. And since we have already tried (and failed) to divide by those smaller numbers, there is no need to keep going...

The resulting code is very similar to the previous one:

from time import time
from math import sqrt

def primeList2(n, verbose = True) :
t = time()
ret = []
for j in range(2,n+1) :
jIsPrime = True
kMax = int(sqrt(j))
for k in range(2,kMax+1) :
if j % k == 0 :
jIsPrime = False
break
if jIsPrime :
ret += [j]

if verbose :
print "Calculated primes to",n,"in",time()-t,"sec."

return ret


Do try it, and feel the thrill of the new speed of the algorithm. If you are after all primes below 10,000, this new function will probably get them for you... a hundred times faster!!! If you try it for larger numbers, you will find even better speed increases, because the time required by our algorithm now grows with n1.4.

But we are still doing a lot of superfluous checking. Think again of 9973. Once we have tried to divide it by 2 and failed, there is no point in checking with 4, 6, 8 or any other multiple of 2. So yes, we could double the speed of our algorithm by checking only odd numbers, but lets spend a little more time thinking before going back to coding... Because, after failing with 2, once we turn our attention to 3 and again fail, what's the point in trying 6, 9, 12 or any other multiple of 3? If you keep the reasoning going, it becomes obvious that you only need to do trial division by prime numbers, since by the time we test whether a composite number divides 9973, we have already tested all its prime factors, and if those have failed, this one will as well.

Now wait a minute! Isn't finding whether a number is prime or not precisely what we are after? How are we going to figure out what numbers to do trial division with? Well, we already saw in our first optimization that our division trials only need to go up to the square root of the number being tested, so we only need to know the first prime number, 2, and afterwards we can use the list of prime numbers that we are building to generate the additional items... The code to do that would look something like this:


from time import time
from math import sqrt

def primeList3(n, verbose = True) :
t = time()
ret = []
for j in range(2,n+1) :
jIsPrime = True
kMax = sqrt(j)
for k in ret :
if k > kMax :
break
if j % k == 0 :
jIsPrime = False
break
if jIsPrime :
ret += [j]

if verbose :
print "Calculated primes to",n,"in",time()-t,"sec."

return ret


If you test this code you'll surely notice the new speed increase. This last optimization doesn't make it a hundred times faster, but on my computer the new code is 3.5 faster than the previous one when calculating all primes below 100,000 and 5.5 times faster when calculating all primes below 1,000,000. And again, what really makes the new code more efficient is not that 4x increase in speed, but the fact that, the larger our input number gets, the better the new code is. It seems that now time required grows with n1.25...

THis last version holds the fundamental ideas of really fast versions, i.e. skipping multiples of primes, but still relies on the slow ans tie consuming operation of trial division. Getting rid of it is the next optimization, which we will see in the next post.

4 comments:

  1. Check out http://www.tiawichiresearch.com/?p=44 for testing the primality of a number

    ReplyDelete
  2. see also :
    http://code.activestate.com/recipes/576596/

    ReplyDelete
  3. Here's really fast primality testing. Probabilistic, but good enough for all purposes and widely used:

    http://eli.thegreenplace.net/2009/02/21/rabin-miller-primality-test-implementation/

    ReplyDelete
  4. While I hope the code I write ends up being good enough so that this blog can serve as a repository of algorithms, and truly be a numerical cookbook, the true purpose is to give some insight into why things are done the way they are done.

    I haven't benchmarked them, but the code in the links you people have posted looks as if it should perform at least as good as the best I can produce. They all are much better than what this post holds, which is why it is titled "Prime Numbers the Wrong Way"...

    I actually deal with some of the algorithms in Charles and Louis links in my new post:

    http://numericalrecipes.blogspot.com/2009/03/prime-numbers-2-sieving-of-erathostenes.html

    Primality testing, both in probabilistic and deterministic fashions, will have to wait a little longer, but will also find their way to these pages.

    There is a great review on many of these subjects that can be found here as well:

    http://krenzel.info/?p=83

    In any case, thanks for commenting!

    ReplyDelete