Prime Time

May 03, 2016

Building an Efficient Algorithm to Determine if a Number is Prime

While developing algorithms for the various mathematics concepts in the Math Tools section, I realized that there is no predefined function in JavaScript that determines whether or not a number is a Prime Number. Some concepts, such as determining the Greatest Common Factor between two numbers, can use prime numbers when determining solutions. With that, I decided to analyze the properties of prime numbers and devise a function that checks to see whether or not a number is prime.

What is a Prime Number?

By definition, a prime number is any whole number greater than 1 that can be evenly divided by only 1 and itself[ref]. When I say evenly divided, it means that when two numbers are divided, the quotient does not produce a remainder/decimal number. We can also say that a number x is divisible by a number y when y is evenly divided by x. For example, 29 is a prime number because no other number beside 1 and 29 divide evenly into 29. Any other number between 1 and 29 will produce a quotient with a remainder when divided into 29.

On the other hand, any number that can be evenly divided by 1, itself and at least one other number is called a composite number. Basically, a composite number is any positive integer that is not a prime number. For example, the number 30, can be evenly divided by 1, 2, 3, 5, 6, 10, 15 and 30. Since 30 is divisible by six other numbers besides 1 and 30, the number 30 is a composite number.

Another interesting property to take note of is that the set of numbers that evenly divide into a number, n, are the factors of n. Essentially, for any number x that evenly divides into any number y, we can say that x is a factor of y. Looking at our examples above, the set of factors of 29 is [1, 29] and the set of factors of 30 is [1, 2, 3, 5, 6, 10, 15, 30]. Consequentially, we can define a prime number as a number that only has two factors: 1 and itself. Understanding prime numbers in terms of factors will be helpful when trying to construct a prime number checking algorithm.

A simple brute force algorithm

Probably the most simple way to determine if a number n is prime is to check every number between 1 and n to see if any divide evenly into n. If at least one number does, then n is not prime. Otherwise, if all numbers between 1 and n do not evenly divide into n, then n is prime.

A possible algorithm would be implemented as follows:

Take a number n as a parameter.
Make sure that n is a whole number greater than one. If not, return false.
Check is n is 2. If it is, return true. 2 will be checked separately since the following step will not proceed correctly when n = 2
Create a loop containing an index, i, that uses the modulo operation to determine if n % {i: 1 < i < n} results in a 0.
If at least one n mod i = 0 exists, then the loop will break and return false.
On the other hand, if every iteration of the loop occurs and it does not produce a 0, the number is prime and will therefore return true.

Below is pseudocode for this algorithm. We will call it IsPrime():

IsPrime(n)
     if n <=1 OR n mod 1 != 0 Then
         return false
     end IF

     if n == 2 THEN
         return true
     end IF
    
     for i <-- 2 to n-1
         if n mod i == 0
              return false
         end if
     end for
     return true

Let's look at the algorithm's complexity. The best-case scenario for IsPrime() is when n is an even number. This is because 2 divides evenly into every even number and IsPrime() first checks to see if 2 evenly divides into n. If n is even, the loop iterates only once to see that 2 evenly divides into n and then returns false. Since IsPrime() finishes on the first iteration of the loop, the best-case time complexity is O(1) in Big-Oh notation, which indicates constant time.

The worst-case scenario, on the other hand, is when n is a prime number. This is because when n is prime, IsPrime() will iterate through the entire loop to ensure that n is not divisible by any number between 1 and n. When n is prime, the maximum number of iterations the loop will make is n-2 (1 and n is always divides evenly into n so they don't need to be checked). Therefore, worst-case time complexity is O(n). This is linear time, which means the time it takes IsPrime() to complete increases at the same rate as n.

Calculating the average-case scenario for IsPrime() is a bit more complicated to determine. However, we can say that the complexity is better than O(n/2) for several reasons. First, let's identify the set of all numbers from 1 to infinity as P. Second, let's assume that all odd numbers in are prime. Essentially, half the numbers in P are even, while the other half is odd. Consequently, if we were to pass all numbers in P as parameters into IsPrime(), we would have an equal amount of best-case and worst-case scenarios, resulting in an average complexity of O(n/2). However, in reality, not all odd numbers are primes, meaning that IsPrime() will not progress through all iterations of the loop on composite odd numbers. In fact, for a composite odd number n, the loop will never iterate more than sqrt(n) times (I'll discuss this property later). Since the ratio of prime numbers to odd numbers is not known, we can only say that the complexity of average-case scenario is better than O(n/2).

Improving the algorithm

On the whole, IsPrime() can be inefficient for large prime numbers. Let's use the prime number 67,280,421,310,721 as an example (a large number discovered to be prime in 1855 by Thomas Clausen) and trigger the worst-case scenario. Since the complexity of the worst-case scenario is in linear time, the loop must iterate about 67,280,421,310,721 times for IsPrime() to determine that it is prime. That's over 67 trillion comparison and modulo operations.

Fortunately, we can develop a more efficient algorithm by closely analyzing specific properties of prime numbers and factors. As a result, we will improve the worst-case and average-case scenarios of the brute force algorithm.

Understanding Prime factors

Although I will not go into it with great detail in this blog entry, it is important to understand the basic properties of prime factors and prime factorization. A prime factor of a number n is a prime number that, when multiplied by 1 or other prime factors of n, results in n. Prime factorization is the process of determining the prime factors of a number by constantly dividing it and its quotients further and further until it cannot be evenly divided anymore (i.e. until you reach a prime number since they have no factors besides 1 and itself). However, if number cannot be broken down into prime factors, it itself is a prime number.

Let's use 29 and 30 as examples again. No number can divide evenly into 29, and therefore cannot be factored. The number 30 on the other hand, can be broken down further into prime factors: 30 = 2x15 = 2x3x5. The prime factors of 30 are [2, 3, 5]. Because 29 has no prime factors, it itself is a prime number. Since 30 has at least one other than itself, it is a composite number.

Essentially, you can say that the prime factors of a number are its root multiplicands, with which all factors of that number can be determined. Given this, determining whether or not a number has a prime factor, that is if it is divisible by a prime number, is all that is needed to determine whether or not a number is prime. Therefore, when referring back with our original algorithm, instead of checking if every number between 1 and n divide evenly into n to see if it's prime, it is enough to simply check of only the prime numbers between 1 and n divide evenly into n.

With this, a new problem arises: how do we determine which numbers between 1 and n are prime? One option is to recursively call IsPrime() every time the loop iterates to see if the number being checked is a prime number, but that may render IsPrime() inefficient for large numbers. Another option is to keep a list of prime numbers and only compare the numbers in that list, but there are potentially an infinite number of prime numbers that we would need to keep track of. A third option is to check if only odd numbers between 1 and n divide evenly into n. As mentioned above, all even numbers are divisible by 2 and are, therefore, not prime. The algorithm would then be setup to check if n is even at the beginning. Although checking all odd numbers would be redundant given that some odd numbers are composite numbers, it is still the most efficient option.

Square Root Limit

We can still improve IsPrime() even further by looking at an important property of factors. We can reduce the number of values we need to check from 1 to n to 1 to the square root of n. Checking any number past sqrt(n) is redundant since we would have already checked its "partner" number if we check the numbers from 1 to sqrt(n) in numerical order.

To understand this better, let's look at the number 36 as an example. The factors of 36 are: [1, 2, 3, 4, 6, 9, 12, 18, 36]. The square root of 36 is 6, which means that to determine all factors of 36, we really only need to check which numbers between 2 and 6 divide evenly into 36. To understand further, let's take three factors of 36 as an example. The number 4 is a factor of 36 because 4 x 9 = 36. The number 6 is a factor because 6x6 is 36. The number 9 is a factor because 9x4 = 36. As you can see, checking whether or not 9 is a factor was redundant. We already determined that 9 was a factor because 4 was a factor. Also, when determining factors in numerical order, 4 is the factor found before 6 and 9 is the factor found after 6.

In a way you can see the square root of a number n as its "middle factor". If any number less than the square root of n is a factor of n, then there is a factor greater than the square root of n. Consequentially, if n has no factors that are less than the square root of n, then n has no factors that are greater than n. Therefore, when determining if a number n is prime, we only need to check to see if any number between 1 and sqrt(n) evenly divide into n.

This property also applies to numbers that do not have a whole number for a square root. For example, the square root of 20 = 4.4721... . To find the factors of 20, we would check if any number between 1 and the floor value of the square root of 20, which is 4, divides evenly into 20. The symmetry still applies because we see that 4 is a factor of 20 because 4x5 is 20 and 5 is a factor because 5x4 is 20.

The Algorithm Revamp

Based on these properties, we can revamp IsPrime() to be more efficient than the brute force algorithm. Below is pseudocode for the revamped algorithm:

IsPrime(n)
    if n == 2 then
        return true
    end if

    if n mod 2 != 0 AND n mod 1 == 0 AND n > 1 THEN
        var sqrt <-- sqrt(n)
        var i <-- 3

        if sqrt mod 1 != 0 THEN
            sqrt <-- Floor(sqrt);

            while i < sqrt THEN
                if n mod i == 0 THEN
                     return false
                end if
                i <-- i + 2;
            END WHILE
            return true;
        END IF
    END IF
    return false

Let's now summarize the steps of the revamped IsPrime():

Check if n = 2 like before. If so, return true. This time around, it is checked separately because the loop will only check for odd numbers.
Check if n is even, a fraction or if it is 1 or less. If so, skip the loop and return false.
Store the square root of n and check if it's a whole number. If it is, then n is not prime, as its square root would then be a factor and we return false (checking this early is beneficial since we wouldn't need to enter the loop if this is the case)
Loop to see if any values between 2 and the floor of the square root of n divide evenly into n. If such a number is found, return false.
Exit the loop. Since no restrictions are found at this point, n is declared a prime number and the loop returns true.

Finally, let's return once again to our worst-case and average-case scenarios. Before, IsPrime() would take linear time in both scenarios because the loop would perform about n iterations to check if a number n is prime. Since then, we have reduced the number of checks we need to perform. First, we determined that we only need to check odd numbers, reducing the number of loop iterations in half, thus decreasing our checks from n to n/2. Next, we determined that we only needed to iterate up to the square root of n, thus further reducing our checks from n/2 to sqrt (n)/2. Therefore, the complexity of our new worst-case scenario is O(sqrt (n)/2) and the complexity of our new approximate average-case scenario is O(sqrt (n)/2/2) or O(sqrt(n)/4).

In both scenarios, we can see that the algorithm has been reduced from linear time to square root time. This is a very significant improvement on our algorithm. The time it takes an algorithm that takes linear time to complete will increase at a constant rate as its input, n, gets larger. On the other hand, the time it takes an algorithm that takes square root time to complete will increase at a decreasing rate as its input, n, gets larger. This significantly reduces the time needed to process very large numbers.

To understand how significant an improvement this is, let's analyze our very large prime number, 67,280,421,310,721 again with our revamped version of IsPrime(). To demonstrate the efficiency of a square root time algorithm over a linear time algorithm, let's ignore the 1/2 for now. The square root of our number is approximately 8,202,464.34376 whose floor value is 8,202,464. That's 8 million iterations versus the 67 trillion iterations from our previous algorithm. Additionally, to understand the efficiency of the square root, all prime numbers in the range of {i : 67,280,415,671,296 <= i < 67,280,432,076,225} will take 8,202,464 iterations. Finally, since we are actually ignoring even numbers, we reduce the number of iterations further to 8,202,464 /2 = 4,101,232, only 4 million iterations.

Below is the algorithm translated in JavaScript:


function isPrime(num)
{
	var n = Number(num);

	if (n === 2){return true;}

	if (!isNaN(n) && n % 2 !== 0 && n % 1 === 0 && n > 1)
	{
		var sqr = Math.sqrt(n), i = 3;

		if(sqr % 1 !== 0)
		{
			sqr = Math.floor(sqr);

			while(i <= sqr)
			{
				if (n % i === 0){return false;}
				i = i + 2;
			}
			return true;
		}
	}
	return false;
}