Monday, 6 April 2009

C# and Memoization

I'm finally beginning to understand why people like C# - it's familiar for C++-programmers and it includes some of those powerful constructs from functional programming I was always missing when writing code in C++. Not having to worry about freeing memory is nice, too.

Imagine my excitement when we had to write an easy integer factorizer to simulate a workload task related to threading. I knew I could do closures, so it would be trivial to do memoization to speed up this task, too.

Let's start with the following simple function to check if a number is a prime number. Note the caching of the square root - otherwise it would calculate the square root with every loop - and that's expensive.
static bool isPrim(ulong n) {
if (0 == (n % 2)) {
return false;
}
double x = Math.Sqrt(n);
for (ulong i = 3; i <= x; i += 2) {
if (0 == (n % i)) {
return false;
}
}
return true;
}


We can easily transform this to a generated function - delegates are nice and timer benchmarks show no measureable performance loss. There are no changes to other code necessary.

The generated function is needed to have a closure. This means any variables inside the generator functions will be stored with the generated function and can be accessed and modified when the function is called. Imagine for instance a counter function, where this behaviour is very useful.

But for now we only rewrite the function to be generated.
static isPrimDelegate isPrim = isPrimGenerator();
delegate bool isPrimDelegate(ulong n);

static isPrimDelegate isPrimGenerator() {
return delegate(ulong n)
{
if (0 == (n % 2)) {
return false;
}
double x = Math.Sqrt(n);
for (ulong i = 3; i <= x; i += 2) {
if (0 == (n % i)) {
return false;
}
}
return true;
};
}


We can easily see that the code could be better. It shouldn't test division for 15 for instance, because everything dividable by 15 is dividable by 3 and 5, too. We could make it faster by only checking for division by prime numbers - and that's just what memoization helps us to do.

We'll save a list with prime numbers along with the function and check only for division by these numbers. This way, we'll have the work of finding primes only once and we'll get a nice speed up.

As you see, adding the list to the function is as simple as declaring a variable in the generator function. We add the primes 2 and 3 to make the later code simpler, adding more primes up-front didn't result in any faster code for me.

Now we check for division with all primes in the list. If we reach the end of the list before we checked all primes in the range, we lazily search for the next prime. We still check for 15 when searching for primes, but this will be done only once and not every time.
static isPrimDelegate isPrim = isPrimGenerator();
delegate bool isPrimDelegate(ulong n);

static isPrimDelegate isPrimGenerator() {
List prims = new List();
prims.Add(2);
prims.Add(3);
return delegate(ulong n)
{
int index = 0;
ulong m = prims[index];
double x = Math.Sqrt(n);
while (m <= x) {
if (0 == (n % m)) {
return false;
}
if (prims.Count == index) {
int index2 = 0;
bool prim = false;
while (!prim) {
m += 2;
prim = true;
double x2 = Math.Sqrt(m);
while (prims[index2] <= x2) {
if (0 == (m % prims[index2])) {
prim = false;
break;
}
index2++;
}
}
prims.Add(m);
} else {
m = prims[index++];
}
}
return true;
};
}

We end up with faster code by trading memory for CPU time. In most cases it's worth it as the second call to a function will be near-instanteous, if all primes up to the square root of the parameter have already been cached.

This can also be applied to other functions, like costly data base lookups where the result doesn't change, but is needed multiple times. As it is transparent to the caller it's an obvious optimization to make, when profiling finds a function slowing down the code.

Impressions of Squeak

It's slow. Granted, this was on a 450 MHz system which I accessed via X-forwarding, but I'll better prepare the image on my faster computer and try to re-assess performance on a headless session.

Apart from being slow, it has also questionable looks with very bright colours and a comic-style font. But behind these looks...there's a system which seems very well thought-out and is intuitive to use for a first time user.

Installation of programs is made easy by an integrated package manager. Everything is graphical and clear, although it can be confusing with many windows opened.

Now I'll have to learn more about Smalltalk and using Squeak before I have a closer look at AIDA/Web and Seaside.

Thursday, 2 April 2009

First stop - Smalltalk

The first frameworks I'll look at will be written in Smalltalk. I set up an old PC (450MHz Celeron, 256MB RAM) with Debian Lenny and installed Squeak and also got an old (1996, pre Squeak) Smalltalk book from the library to learn enough of the language.

A short look at the two available frameworks:

Seaside

This framework differs from others by managing state using continuations, which makes it very interesting. It seems to attract developers as shown by projects such as the CMS pier and the related meta-description framework magritte (If I understand it correctly, it servers to create re-usable models of re-occuring tasks like data validation.). Recent versions advertise a big reduction of memory usage, which seems to have been a problem in the past.

20090410: To gain a better understanding of continuations in Seaside, have a look at Julian Fitzell's blog post. It's very interesting, thanks to him for pointing to it.

AIDA/Web

AIDA/Web was first presented in 1996 and still actively in development. It seems to be very mature and powers for instance the Squeak home page. It is based on MVC, supports Ajax and even includes a web server. It advertises integrated session and security support and REST-like urls, but I have yet to find out how they do it. It claims simplicity yet to allow for complex applications, just like the language Smalltalk.

Wednesday, 1 April 2009

Where the journey starts and where it might lead.

We're at day 0 of this project, so let's have a look at where I stand now.

I'm a student of commercial information systems. During my studies and in intern and open source jobs, I've mostly worked with multi-platform or Unix/Linux applications in C/C++, but also touched Java and C#. In the time before, I've briefly worked with XHTML, CSS, PHP and MySQL during an internship, back when nobody really knew what AJAX meant.

About two years ago, due to a strong interest in linguistics, I became interested in functional programming and experimented with languages like Erlang, Haskell and finally Common Lisp.

You should therefore expect lesser known frameworks on my list, written in more obscure languages. This includes Furnace written in Factor or Seaside written in Smalltalk.

Depending on how I like a framework and the language it's written in, I might not write much about it. And I don't know yet what to write, but we're only at day 0 and there's no reason to worry about this now. Let's see, I'm looking forward to this experience.