Numerical Recipes: Prime Numbers 1. The Wrong Way

A prime number is a natural number larger than 1 which is only divisible by 1 and itself. Simple, isn't it? Euclid already proved quite a while ago that there are infinitely many of them, so lets try and write a function that returns all primes below a given number, using the most straightforward approach. Whta would that be? Well, there is nothing less sophisticated than try and divide each number by every integer larger than 1 and smaller than the number itself, and if we never get a zero remainder then the number is prime. Putting it in python:


from time import time

def primeList1(n, verbose = True) :
t = time()
ret = []
for j in range(2,n+1) :
    jIsPrime = True
    for k in range(2,j) :
        if j % k == 0 :
            jIsPrime = False
            break
    if jIsPrime :
        ret += [j]
if verbose :
    print "Calculated primes to",n,"in",time()-t,"sec."

return ret

Do notice the feeble attempt at code optimization, with the break statement that ends the loop as soon as a divisor is found... If you try it out, it will work fine. Use it to calculate all prime numbers below 1,000, and it will probably give you an instant answer. 10,000 delays a second or two, nothing too worrying. If you try it with 100,000, you'll have time to take your coffee break before it finishes, and adding any more zeroes will require leaving your script running overnight... Not too surprising really, since to determine that, for example, 9973 is indeed a prime number, our program is having to do trial division by every number between 2 and 9972, both inclusive. All in all, the time required grows with n².

There are several very simple improvements that can speed things up dramatically, and which require a little, but just a little, mathematical insight...

First, we don't need to test every number between 2 and 9972 to determine that 9973 is prime; we can stop after the largest integer smaller than the square-root of 9973, i.e. 99. Why? Well, if a number is divisible by a number larger than its square-root, it will also be divisible by the quotient of that division, which is necesaarily smaller than the square-root. And since we have already tried (and failed) to divide by those smaller numbers, there is no need to keep going...

The resulting code is very similar to the previous one:


from time import time
from math import sqrt

def primeList2(n, verbose = True) :
t = time()
ret = []
for j in range(2,n+1) :
    jIsPrime = True
    kMax = int(sqrt(j))
    for k in range(2,kMax+1) :
        if j % k == 0 :
            jIsPrime = False
            break
    if jIsPrime :
        ret += [j]

if verbose :
    print "Calculated primes to",n,"in",time()-t,"sec."

return ret

Do try it, and feel the thrill of the new speed of the algorithm. If you are after all primes below 10,000, this new function will probably get them for you... a hundred times faster!!! If you try it for larger numbers, you will find even better speed increases, because the time required by our algorithm now grows with n^1.4.

But we are still doing a lot of superfluous checking. Think again of 9973. Once we have tried to divide it by 2 and failed, there is no point in checking with 4, 6, 8 or any other multiple of 2. So yes, we could double the speed of our algorithm by checking only odd numbers, but lets spend a little more time thinking before going back to coding... Because, after failing with 2, once we turn our attention to 3 and again fail, what's the point in trying 6, 9, 12 or any other multiple of 3? If you keep the reasoning going, it becomes obvious that you only need to do trial division by prime numbers, since by the time we test whether a composite number divides 9973, we have already tested all its prime factors, and if those have failed, this one will as well.

Now wait a minute! Isn't finding whether a number is prime or not precisely what we are after? How are we going to figure out what numbers to do trial division with? Well, we already saw in our first optimization that our division trials only need to go up to the square root of the number being tested, so we only need to know the first prime number, 2, and afterwards we can use the list of prime numbers that we are building to generate the additional items... The code to do that would look something like this:


from time import time
from math import sqrt

def primeList3(n, verbose = True) :
   t = time()
   ret = []
   for j in range(2,n+1) :
       jIsPrime = True
       kMax = sqrt(j)
       for k in ret :
           if k > kMax :
               break
           if j % k == 0 :
               jIsPrime = False
               break
       if jIsPrime :
           ret += [j]

   if verbose :
       print "Calculated primes to",n,"in",time()-t,"sec."

   return ret

If you test this code you'll surely notice the new speed increase. This last optimization doesn't make it a hundred times faster, but on my computer the new code is 3.5 faster than the previous one when calculating all primes below 100,000 and 5.5 times faster when calculating all primes below 1,000,000. And again, what really makes the new code more efficient is not that 4x increase in speed, but the fact that, the larger our input number gets, the better the new code is. It seems that now time required grows with n^1.25...

THis last version holds the fundamental ideas of really fast versions, i.e. skipping multiples of primes, but still relies on the slow ans tie consuming operation of trial division. Getting rid of it is the next optimization, which we will see in the next post.

4 comments:

UnknownMarch 18, 2009 at 5:01 AM
Check out http://www.tiawichiresearch.com/?p=44 for testing the primality of a number
UnknownMarch 18, 2009 at 10:25 AM
see also :
http://code.activestate.com/recipes/576596/
AnonymousMarch 18, 2009 at 6:24 PM
Here's really fast primality testing. Probabilistic, but good enough for all purposes and widely used:

http://eli.thegreenplace.net/2009/02/21/rabin-miller-primality-test-implementation/
JaimeMarch 18, 2009 at 8:25 PM
While I hope the code I write ends up being good enough so that this blog can serve as a repository of algorithms, and truly be a numerical cookbook, the true purpose is to give some insight into why things are done the way they are done.

I haven't benchmarked them, but the code in the links you people have posted looks as if it should perform at least as good as the best I can produce. They all are much better than what this post holds, which is why it is titled "Prime Numbers the Wrong Way"...

I actually deal with some of the algorithms in Charles and Louis links in my new post:

http://numericalrecipes.blogspot.com/2009/03/prime-numbers-2-sieving-of-erathostenes.html

Primality testing, both in probabilistic and deterministic fashions, will have to wait a little longer, but will also find their way to these pages.

There is a great review on many of these subjects that can be found here as well:

http://krenzel.info/?p=83

In any case, thanks for commenting!

Numerical Recipes

Thursday, March 5, 2009

Prime Numbers 1. The Wrong Way

4 comments:

About Me

Labels

Links

Blog Archive

Followers

Numerical Recipes

Thursday, March 5, 2009

Prime Numbers 1. The Wrong Way

4 comments:

About Me

Labels

Links

Subscribe To

Blog Archive

Followers