Tom Says: Safe code is boring code!
I recently cleared away the majority of my feeds in Google Reader, meaning that some of the infrequent posters' items finally saw the light of day. One was a link to Project Euler (via Socks and Sandals), a site with many math problems designed to be solved by programming.
Of course prime numbers steal the show! I pulled out the so-called "beautiful" prime number sieve written in Newsqueak and presented by Rob Pike in the video "Advanced Topics in Programming Languages: Concurrency/message passing Newsqueak." I guess I think it's beautiful too, because I found it more understandable than any other sieve implementation I've seen.
As the Newsqueak interpreter, squint has limitations that cause it to bail out on big numbers fairly quickly, I ported it to Ruby:
# Find the 10001th prime
$primes = 0
$prime = 0
# From solution 3
class Integer
def prime?
2.upto(self/2) do |i|
return false if self % i == 0
end
true
end
end
# Based on newsqueak prime number sieve
class Counter
def initialize(chan)
@chan = chan
end
def go
i = 2
loop do
@chan.receive i
break if $primes == 10001
i += 1
end
end
end
class Filter
def receive(i)
unless @num
@num = i
#puts "#{@num} #{@num.prime?}"
$primes += 1
$prime = @num
@dest = Filter.new
end
if (i % @num) != 0
@dest.receive(i)
end
end
end
Counter.new(Filter.new).go
puts $prime
If you've seen the original implementation (in squint/progs/sieve3 of the distribution linked above), you know that I haven't changed much. I just sort of contorted objects to make them behave a little like Newsqueak's "channels" and let it be.
If you haven't seen that presentation and aren't familiar with sieves, it probably looks like nonsense. It's just a prime number sieve.
The idea is that we make a chain of prime number filters and string them together. The Counter simply outputs integers indefinitely into the first Filter for the smallest prime number, two. This filter only passes future numbers to the next filter if they are not divisible by two. The first number to get through every filter is the next prime, three. This number becomes the next filter, and the only numbers after that to get through all filters are those that are not divisible by two or three. New primes are those that pass through every filter.
The chain is meant to grow indefinitely as you find more primes. I didn't spend time improving this implementation because the chain's growth was stifled too quickly: Ruby simply ran out of stack space well before the problem was solved.
The only logical thing to do was rewrite it in a lower-level language like C. After many iterations, I wound up with this:
#include <stdio.h>
/* Find the 10001th prime */
#define PRIMES 10001
int main()
{
int primes[PRIMES];
int num_primes = 0;
int i, prime;
for(i = 0; i < PRIMES; i++)
primes[i] = -1;
for(i = 2; num_primes < PRIMES; i++)
{
for(prime = 0; ; prime++)
{
if(prime == num_primes)
{
primes[prime] = i;
num_primes++;
break;
}
else if(i % primes[prime] == 0)
break;
}
}
printf("%d\n", primes[PRIMES-1]);
return 0;
}
You'll notice that:
I was satisfied for a while. After all, it found the 10001th prime in 0.4 seconds. I only became unhappy when I saw that someone had reported to use the same strategy in their program and solved the problem in 0.04 seconds, and found the 1,000,000th prime in 20 seconds. Mine took 31 seconds to find the 100,000th prime. Either that guy was lying, or I wrote bad code.
I haven't decided which of those is the problem, but I was inspired to do something crazy: make it multi-threaded! After all, I was working on a dual-core computer; maybe if I reached 200% CPU usage things would go a little faster.
I should mention that this is my first ever attempt at using POSIX threads. And there are likely race conditions I missed.
#include <stdio.h>
#include <pthread.h>
#include <assert.h>
#define PRIMES 100010
int primes[PRIMES];
pthread_t thread_a, thread_b;
pthread_mutex_t next_mutex, count_mutex;
int next, a_grabbed = 0, b_grabbed = 0;
int done = 0;
int min_grabbed()
{
int ret;
pthread_mutex_lock(&next_mutex);
ret = a_grabbed < b_grabbed ? a_grabbed : b_grabbed;
pthread_mutex_unlock(&next_mutex);
return ret;
}
int grab_next()
{
int ret;
pthread_mutex_lock(&next_mutex);
ret = next;
next++;
if(pthread_equal(pthread_self(), thread_a))
a_grabbed = ret;
else
b_grabbed = ret;
pthread_mutex_unlock(&next_mutex);
return ret;
}
void found_one(int prime, int val)
{
primes[prime] = val;
if(prime == PRIMES-1)
done = 1;
}
/* Thread task */
void *find_primes()
{
int i, prime, min;
min = 0;
for(i = grab_next(); !done; i = grab_next())
{
for(prime = 0; !done; prime++)
{
if(primes[prime] == -1)
{
/* wait for the other thread to catch up before making a decision */
while(i > min)
min = min_grabbed();
/* if the other thread found a prime while we were waiting... */
if(primes[prime] != -1)
continue;
found_one(prime, i);
break;
}
else
{
if(i % primes[prime] == 0)
break;
}
}
}
printf("finished\n");
return NULL;
}
int main()
{
int i;
for(i = 0; i < PRIMES; i++)
primes[i] = -1;
next = 2;
pthread_mutex_init(&next_mutex, NULL);
pthread_mutex_init(&count_mutex, NULL);
pthread_create(&thread_a, NULL, find_primes, NULL);
pthread_create(&thread_b, NULL, find_primes, NULL);
pthread_join(thread_a, NULL);
pthread_join(thread_b, NULL);
printf("%d\n", primes[PRIMES-1]);
return 0;
}
I reach 100+% CPU usage, yes, but there's a little problem: it has nearly the same performance characteristics as the non-threaded version. In other words, it takes nearly exactly the same amount of time.
This, I don't understand. Here's why I thought I would have a performance gain.
Yes, one thread cannot make too much progress past another. This is because whether a number is prime depends on the filters being in place for every prime number that comes before it. However, when a prime number is reached, it takes a long time to prove it, and in that time another thread should be able to disqualify quite a few numbers before reaching the next prime. The only waiting it has to do is when it thinks it has found a prime: it's either a prime or a multiple of the prime the other thread is working on.
So my general strategy for each thread was this:
I wish I knew why this doesn't perform like I think it should.
Posted Oct 07, 2007, in the morning.