Building an Efficient Algorithm to Determine if a Number is Prime
While developing algorithms for the various mathematics concepts in the Math Tools section, I realized that there is no predefined function in JavaScript that determines whether or not a number is a Prime Number. Some concepts, such as determining the Greatest Common Factor between two numbers, can use prime numbers when determining solutions. With that, I decided to analyze the properties of prime numbers and devise a function that checks to see whether or not a number is prime.
What is a Prime Number?
By definition, a prime number is any whole number greater than 1 that can be evenly divided by only 1 and itself[ref]. When I say evenly divided, it means that when two numbers are divided, the quotient does not produce a remainder/decimal number. We can also say that a number x is divisible by a number y when y is evenly divided by x. For example, 29 is a prime number because no other number beside 1 and 29 divide evenly into 29. Any other number between 1 and 29 will produce a quotient with a remainder when divided into 29.
On the other hand, any number that can be evenly divided by 1, itself and at least one other number is called a composite number. Basically, a composite number is any positive integer that is not a prime number. For example, the number 30, can be evenly divided by 1, 2, 3, 5, 6, 10, 15 and 30. Since 30 is divisible by six other numbers besides 1 and 30, the number 30 is a composite number.
Another interesting property to take note of is that the set of numbers that evenly divide into a number, n, are the factors of n. Essentially, for any number x that evenly divides into any number y, we can say that x is a factor of y. Looking at our examples above, the set of factors of 29 is [1, 29] and the set of factors of 30 is [1, 2, 3, 5, 6, 10, 15, 30]. Consequentially, we can define a prime number as a number that only has two factors: 1 and itself. Understanding prime numbers in terms of factors will be helpful when trying to construct a prime number checking algorithm.
A simple brute force algorithm
Probably the most simple way to determine if a number n is prime is to check every number between 1 and n to see if any divide evenly into n. If at least one number does, then n is not prime. Otherwise, if all numbers between 1 and n do not evenly divide into n, then n is prime.
A possible algorithm would be implemented as follows:
Below is pseudocode for this algorithm. We will call it IsPrime():
IsPrime(n)
if n <=1 OR n mod 1 != 0 Then
return false
end IF
if n == 2 THEN
return true
end IF
for i <-- 2 to n-1
if n mod i == 0
return false
end if
end for
return true
Let's look at the algorithm's complexity. The best-case scenario for IsPrime() is when n is an even number. This is because 2 divides evenly into every even number and IsPrime() first checks to see if 2 evenly divides into n. If n is even, the loop iterates only once to see that 2 evenly divides into n and then returns false. Since IsPrime() finishes on the first iteration of the loop, the best-case time complexity is O(1) in Big-Oh notation, which indicates constant time.
The worst-case scenario, on the other hand, is when n is a prime number. This is because when n is prime, IsPrime() will iterate through the entire loop to ensure that n is not divisible by any number between 1 and n. When n is prime, the maximum number of iterations the loop will make is n-2 (1 and n is always divides evenly into n so they don't need to be checked). Therefore, worst-case time complexity is O(n). This is linear time, which means the time it takes IsPrime() to complete increases at the same rate as n.
Calculating the average-case scenario for IsPrime() is a bit more complicated to determine. However, we can say that the complexity is better than O(n/2) for several reasons. First, let's identify the set of all numbers from 1 to infinity as P. Second, let's assume that all odd numbers in are prime. Essentially, half the numbers in P are even, while the other half is odd. Consequently, if we were to pass all numbers in P as parameters into IsPrime(), we would have an equal amount of best-case and worst-case scenarios, resulting in an average complexity of O(n/2). However, in reality, not all odd numbers are primes, meaning that IsPrime() will not progress through all iterations of the loop on composite odd numbers. In fact, for a composite odd number n, the loop will never iterate more than sqrt(n) times (I'll discuss this property later). Since the ratio of prime numbers to odd numbers is not known, we can only say that the complexity of average-case scenario is better than O(n/2).
Improving the algorithm
On the whole, IsPrime() can be inefficient for large prime numbers. Let's use the prime number 67,280,421,310,721 as an example (a large number discovered to be prime in 1855 by Thomas Clausen) and trigger the worst-case scenario. Since the complexity of the worst-case scenario is in linear time, the loop must iterate about 67,280,421,310,721 times for IsPrime() to determine that it is prime. That's over 67 trillion comparison and modulo operations.
Fortunately, we can develop a more efficient algorithm by closely analyzing specific properties of prime numbers and factors. As a result, we will improve the worst-case and average-case scenarios of the brute force algorithm.
Understanding Prime factors
Although I will not go into it with great detail in this blog entry, it is important to understand the basic properties of prime factors and prime factorization. A prime factor of a number n is a prime number that, when multiplied by 1 or other prime factors of n, results in n. Prime factorization is the process of determining the prime factors of a number by constantly dividing it and its quotients further and further until it cannot be evenly divided anymore (i.e. until you reach a prime number since they have no factors besides 1 and itself). However, if number cannot be broken down into prime factors, it itself is a prime number.
Let's use 29 and 30 as examples again. No number can divide evenly into 29, and therefore cannot be factored. The number 30 on the other hand, can be broken down further into prime factors: 30 = 2x15 = 2x3x5. The prime factors of 30 are [2, 3, 5]. Because 29 has no prime factors, it itself is a prime number. Since 30 has at least one other than itself, it is a composite number.
Essentially, you can say that the prime factors of a number are its root multiplicands, with which all factors of that number can be determined. Given this, determining whether or not a number has a prime factor, that is if it is divisible by a prime number, is all that is needed to determine whether or not a number is prime. Therefore, when referring back with our original algorithm, instead of checking if every number between 1 and n divide evenly into n to see if it's prime, it is enough to simply check of only the prime numbers between 1 and n divide evenly into n.
With this, a new problem arises: how do we determine which numbers between 1 and n are prime? One option is to recursively call IsPrime() every time the loop iterates to see if the number being checked is a prime number, but that may render IsPrime() inefficient for large numbers. Another option is to keep a list of prime numbers and only compare the numbers in that list, but there are potentially an infinite number of prime numbers that we would need to keep track of. A third option is to check if only odd numbers between 1 and n divide evenly into n. As mentioned above, all even numbers are divisible by 2 and are, therefore, not prime. The algorithm would then be setup to check if n is even at the beginning. Although checking all odd numbers would be redundant given that some odd numbers are composite numbers, it is still the most efficient option.
Square Root Limit
We can still improve IsPrime() even further by looking at an important property of factors. We can reduce the number of values we need to check from 1 to n to 1 to the square root of n. Checking any number past sqrt(n) is redundant since we would have already checked its "partner" number if we check the numbers from 1 to sqrt(n) in numerical order.
To understand this better, let's look at the number 36 as an example. The factors of 36 are: [1, 2, 3, 4, 6, 9, 12, 18, 36]. The square root of 36 is 6, which means that to determine all factors of 36, we really only need to check which numbers between 2 and 6 divide evenly into 36. To understand further, let's take three factors of 36 as an example. The number 4 is a factor of 36 because 4 x 9 = 36. The number 6 is a factor because 6x6 is 36. The number 9 is a factor because 9x4 = 36. As you can see, checking whether or not 9 is a factor was redundant. We already determined that 9 was a factor because 4 was a factor. Also, when determining factors in numerical order, 4 is the factor found before 6 and 9 is the factor found after 6.
In a way you can see the square root of a number n as its "middle factor". If any number less than the square root of n is a factor of n, then there is a factor greater than the square root of n. Consequentially, if n has no factors that are less than the square root of n, then n has no factors that are greater than n. Therefore, when determining if a number n is prime, we only need to check to see if any number between 1 and sqrt(n) evenly divide into n.
This property also applies to numbers that do not have a whole number for a square root. For example, the square root of 20 = 4.4721... . To find the factors of 20, we would check if any number between 1 and the floor value of the square root of 20, which is 4, divides evenly into 20. The symmetry still applies because we see that 4 is a factor of 20 because 4x5 is 20 and 5 is a factor because 5x4 is 20.
The Algorithm Revamp
Based on these properties, we can revamp IsPrime() to be more efficient than the brute force algorithm. Below is pseudocode for the revamped algorithm:
IsPrime(n)
if n == 2 then
return true
end if
if n mod 2 != 0 AND n mod 1 == 0 AND n > 1 THEN
var sqrt <-- sqrt(n)
var i <-- 3
if sqrt mod 1 != 0 THEN
sqrt <-- Floor(sqrt);
while i < sqrt THEN
if n mod i == 0 THEN
return false
end if
i <-- i + 2;
END WHILE
return true;
END IF
END IF
return false
Let's now summarize the steps of the revamped IsPrime():
Finally, let's return once again to our worst-case and average-case scenarios. Before, IsPrime() would take linear time in both scenarios because the loop would perform about n iterations to check if a number n is prime. Since then, we have reduced the number of checks we need to perform. First, we determined that we only need to check odd numbers, reducing the number of loop iterations in half, thus decreasing our checks from n to n/2. Next, we determined that we only needed to iterate up to the square root of n, thus further reducing our checks from n/2 to sqrt (n)/2. Therefore, the complexity of our new worst-case scenario is O(sqrt (n)/2) and the complexity of our new approximate average-case scenario is O(sqrt (n)/2/2) or O(sqrt(n)/4).
In both scenarios, we can see that the algorithm has been reduced from linear time to square root time. This is a very significant improvement on our algorithm. The time it takes an algorithm that takes linear time to complete will increase at a constant rate as its input, n, gets larger. On the other hand, the time it takes an algorithm that takes square root time to complete will increase at a decreasing rate as its input, n, gets larger. This significantly reduces the time needed to process very large numbers.
To understand how significant an improvement this is, let's analyze our very large prime number, 67,280,421,310,721 again with our revamped version of IsPrime(). To demonstrate the efficiency of a square root time algorithm over a linear time algorithm, let's ignore the 1/2 for now. The square root of our number is approximately 8,202,464.34376 whose floor value is 8,202,464. That's 8 million iterations versus the 67 trillion iterations from our previous algorithm. Additionally, to understand the efficiency of the square root, all prime numbers in the range of {i : 67,280,415,671,296 <= i < 67,280,432,076,225} will take 8,202,464 iterations. Finally, since we are actually ignoring even numbers, we reduce the number of iterations further to 8,202,464 /2 = 4,101,232, only 4 million iterations.
Below is the algorithm translated in JavaScript:
function isPrime(num)
{
var n = Number(num);
if (n === 2){return true;}
if (!isNaN(n) && n % 2 !== 0 && n % 1 === 0 && n > 1)
{
var sqr = Math.sqrt(n), i = 3;
if(sqr % 1 !== 0)
{
sqr = Math.floor(sqr);
while(i <= sqr)
{
if (n % i === 0){return false;}
i = i + 2;
}
return true;
}
}
return false;
}