In this post I am sharing a code kata we did recently at Sinbadsoft: sorting large files with limited memory. By large files I mean files that would not fit in the available amount of RAM. A regular sort algorithm will not perform well in this case as the whole data cannot be loaded at once in memory. The solution of this classic problem is external sorting. It consists of two steps: first, split the file into small chunks that would fit in memory, load each chunk, sort it, and write it back on disk. Second, perform a k-way merge on all the sorted chunks to get the final result.
A few days ago, I needed to encode numeric identifiers in a short and url-safe format. Something similar to what url shorteners use (e.g. SlvUp7 in http://bit.ly/SlvUp7). Encoding the ids in base 64 would work if an alternative alphabet is provided for the non url-safe symbols. But since I wanted to have only alpha-numeric characters, I chose to use base 62 instead.
While working on this problem, I had the idea of the coding exercise I’m sharing here: a utility for converting back and forth base 10 numerals to strings of base X equivalent –just like itoa and atoi in C (except that atoi doesn’t take a base parameter). In order to make things a bit more interesting, I decided to learn myself some Scala and translate the code to functional style. Both implementations are detailed here and available online under the terms of the Apache version 2 license.
At the end, I didn’t resist the temptation to do a C# vs. Scala performance benchmark. I also coded and added a Java implementation to the tests in order to distinguish the performance change due to switching from the CLR to the JVM, and the one introduced by the translation to functional style and to the Scala runtime.
In this post I am sharing another code Kata: an implementation of a binary heap. Once we have the heap implemented, we will easily deduce a heap sort and a priority queue based on it. It takes about 100 lines of C# code.
I find merge sort elegant and easy to implement and to understand for both iterative and recursive approaches. In this post I’ll share a quick (and probably dirty) iterative and recursive implementations of merge sort. Both versions share exactly the same merge operation. The implementation takes less than 30 lines of C#.