In this post I am sharing a code kata we did recently at Sinbadsoft: sorting large files with limited memory. By large files I mean files that would not fit in the available amount of RAM. A regular sort algorithm will not perform well in this case as the whole data cannot be loaded at once in memory. The solution of this classic problem is external sorting. It consists of two steps: first, split the file into small chunks that would fit in memory, load each chunk, sort it, and write it back on disk. Second, perform a k-way merge on all the sorted chunks to get the final result.
Continue reading