The Swift system sort algorithm is based on a python sorting method invented by Tim Peters in 2002, and named Timsort after the author.
The original Timsort is described here: https://svn.python.org/projects/python/trunk/Objects/listsort.txt
Bottom line: We can think of Timsort as a sorting method that builds an array of subarrays, where the subarrays are optimized for sorting by Insertion sort. They are optimized to include runs of increasing values, or runs of decreasing values, or runs of a specific size that it good for Insertion sorting. After the array of subarrays (Runs) is created, the subarrays are sorted. And then the sorted subarrays are merged together to create a final, sorted array.
The performance of Timsort, when compared to the previous standard in python known as samplesort, is very comparable when sorting random arrays. But is much faster when sorting arrays that have a small amount of pre-existing order. (Think about a common use case with an array of users that might be very large, but also be close to sorted, until that new user appears.)
Hence for most use cases, consider whether the array is likely to exhibit some pre-existing order. In that case, the system’s Timsort is the way to go.
Timsort is a hybrid algorithm. If the size of the array is small, with a count of about 20, for example, then the array is immediately sorted with Insertion sort. The salient reason why is that Insertion sort has a best case speed of O(n). But the worst case is the unacceptable O(n2).
Let’s go into it in detail and see if we can come out the other side with a hard understanding of Timsort. After all of the other sort algorithms we’ve considered here, that should be possible.
1. Timsort begins to scan the array from the first element. In our examples, we always use Doubles as the values. Timsort looks at the Double at index 0.
2. As mentioned already, Timsort is looking for sequences that are almost sorted. But in the absence of that, will also define a sequence based on its length. So as an example, let’s say Timsort looks at the second element, and finds that it’s bigger than the first. And the third element is bigger than the second. This run of elements of increasing size continues to be added to the first formal ‘Run’ as long as the sizes continue to get bigger. We can also think of a ‘Run’ as a subarray.
3. And we can think of Timsort as a sorting method that builds an array of subarrays, where the subarrays are optimized for sorting by Insertion sort.
4. Timsort will also look for decreasing sizes, and as long as it continues to decrease, the elements will be added to a ‘Run.’ But in this case, the elements will be flipped from decreasing size to increasing size. And since this can be done in O(n) time, it’s no big deal.
5. If a ‘Run’ is not a minimum size appropriate for Insertion sort, then elements are added until the count of the run reaches a value between 32 and 64.
6. As mentioned, the ‘Runs’ are added to an array of runs.
7. The subarrays — the elements in the array of ‘Runs’ — are sorted by Insertion sort. This is key. Because when arrays are in a sorted or nearly sorted order, the speed of Insertion sort approaches O(n) — unbeatable.
8. After the sorting of each individual ‘Run,’ the ‘Runs’ are merged in a process like Merge sort. The Merge process can further optimized. We’ll consider that in the next entry.
Notifications