Search and sort
In this introduction we study the most elementary algorithms for searching and sorting. They are less efficient, but are easier to understand. The faster ones rely on more advanced knowledge that is beyond the objectives of this course. There are some probability theories behind such problems but we skip those for now.
The usual way to tech these algorithms is by showing a sequence of values and performing a search or sort operation by hand. After that we attempt to translate every step taken during the manual search or sort operation in a sequence of algorithm's operations. At the end we can do a simplified analyses about the algorithm's performance, nothing that requires deep mathematical knowledge, just a quick evaluation about the worst case (greatest number of operations), best case (least number of operations) and an estimation about the average case.
In the performance analysis we do a proportional estimation. That is, it doesn't matter the speed at which the algorithm is executed because that depends on the processor, what really matters is the number of operations performed.
It's required to known logarithms and the mathematical induction principle to do the search and sort algorithm's performance demonstrations
- Sequential search
/* Receives an array, the array's number of elements and some value x
If x is found in the array, return 1
Else, return -1 */
int i;
for (i = 0; i < n; i++) if (array[i] == x) return 1;
return -1;
}
We have a set of values, we want to know whether x is or isn't in that set. Suppose that the set is unsorted. The elements are randomly distributed. In such case there is only one way to know whether the searched element is there or not, which is to compare it against the whole set, one by one. This is the worst search case.
Best case: one operation. The element is the first and is found in first attempt.
Worst case: n operations, as many as the number of elements in the set. The searched element is the last or isn't contained in the set.
Average: 1 + 2 + ... (n - 1) + n = n.(n - 1)/2. The proof is by induction.
- Binary search (the sequences must be already sorted!)
/* Receives an array, the number of elements and a value x to be found
If found, return the element's index
Else, return -1 */
int start = 0, end = n - 1, middle;
while (start <= end) {middle = (start + end) / 2;
if (a[middle] == x) return middle;
if (a[middle] > x) end = middle - 1;
else start = end + 1;
}
return -1;
}
The decision "if middle is greater, go to the left" can be inverted to "if middle is less than, go to the right", it doesn't matter. The algorithm doesn't count the number of comparisons made, but it's pretty easy to add that. However, as the function can only return "found" or "not found", we'd have to use a pointer to be able to return the number of comparisons made and the result of the search itself.
Let's search the number 39 in an array of 9 elements:
First step, sum the first and last index and halve
(0 + 8) / 2 | ||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
1 | 4 | 6 | 8 | 9 | 11 | 39 | 56 | 78 |
Second step, 39 > 9, then the number cannot be to the left of 9, continue to the right and discard the element from the previous comparison, the number 9
(0 + 8) / 2 | ||||||||
5 | 6 | 7 | 8 | |||||
11 | 39 | 56 | 78 |
Just repeat this process for any searched element. If the quotient isn't integer, then don't round, truncate or simply consider the integer part and discard the fractional part. The search ends when we have (index + index)/2 or simply start = end, which happens in two cases: the element is the last or nowhere to be found.
Worst case: [math]\displaystyle{ 1 + \log_2{n} }[/math] operations. The searched element is the last or is nowhere to be found. Let's do a quick deduction (do it on paper!): begin with any number of elements. Double the quantity. The number of comparisons increases by just one. Try a larger increase, square the number of elements. The number of comparisons increases by just two (the increment was proportional to the power's exponent). Take notice, while the number of elements grows geometrically, the number of comparisons grows linearly, much slower.
Let's repeat the reasoning, but this time "reversed": at each binary search's step, half of the sequence is discarded, which reveals a geometric progression [math]\displaystyle{ n/2 }[/math], [math]\displaystyle{ n/4 }[/math], ..., [math]\displaystyle{ n/2^k }[/math]. The search's limit is 1, because when the table is reduced to 1 element, the halving steps cease. We then have that [math]\displaystyle{ n/2^k }[/math] is at most 1, or [math]\displaystyle{ n/2^k \leq 1 }[/math]. Let's apply some algebra: [math]\displaystyle{ n \leq 2^k }[/math]. Now let's apply the log's definition to reach: [math]\displaystyle{ \log_2{n} \leq k }[/math]. Where n is the number of elements and k the number of comparison operations.
An analogy: a dictionary is a sorted sequence of words. If we'd search for a word using the sequential search, we'd always be forced to browse the dictionary page by page until the searched word is found. With binary search we can open the dictionary directly at the alphabet's letter of the word's first letter. From there we can reduce our search boundaries to that letter, discarding the rest of the dictionary. By repeating the same criteria to the word's second letter, we further reduce the search boundary, discarding many more pages. In numerical terms, the algorithm behaves as a geometric progression of ratio = 1/2. It's not possible to increase the ratio of division to 1/3 or more because we must rely on the fact that we have only two ways to go, either up or down, left or right.
- Selection sort
/* receives two variable's memory addresses and swaps one for the other */
int temp;
temp = *pa;
*pa = *pb;
*pb = temp;
}
/* Receives an array and a number of elements
Sorts the given number of elements in ascending order */
int i, j, imin;
imin = i;
for (j = i + 1; j < n; j++) if (a[imin] > a[j]) imin = j;
swap (&a[imin], &a[i]);
}
}
Let's sort a sequence of 6 values:
index | start | 1° swap | 2° swap | end |
0 | 12 | 1 | 1 | 1 |
1 | 5 | 5 | 5 | 5 |
2 | 8 | 8 | 6 | 6 |
3 | 33 | 33 | 33 | 8 |
4 | 1 | 12 | 12 | 12 |
5 | 6 | 6 | 8 | 33 |
First step: find the smallest element, beginning the search from the second. Swap it with the first of the list.
Second step: find the smallest element, beginning the search from the third. Swap it with the second of the list.
Continue till the sorting is done. In the example, two swaps were enough to sort the sequence.
If there are repetitions, the algorithm swaps the first one found. Notice that the number of swap operations doesn't match the number of loop's iterations. For example, in the first iteration we swapped 1 and 12, in the second, the number 5 is already in its right place.
It's up to you track this in rows or columns.
Best case: the sequence is already sorted. Notice that the only way to know if the sequence is sorted or not is by scanning it, the algorithm is unable to identify whether some element is unsorted or not without scanning the whole sequence first.
Worst case: the sequence is reversed, n/2 swap operations. In spite of that, we still have (n - 1) + (n - 2) + ... 1 + 2 = n.(n - 1)/2 comparison operations to decide whether the elements will be swapped or not.
Average: the algorithm, regardless of the sequence being sorted or not, scans all elements, skipping the first, then the second, so on. Therefore, we have (n - 1) + (n - 2) + ... 1 + 2 = n.(n - 1)/2 comparison operations. The proof is by induction.
Proportional time: in all three cases the time is proportional to [math]\displaystyle{ n^2 }[/math].
- Bubble sort
/* Receives an array and the number of elements. Sort using bubble sort */
int i, j;
j = i;
swap(&a[j], &a[j - 1]);
j--;
}
}
}
Let's sort the same sequence as before, now with bubble sort:
index | start | 1° swap | 2° swap | 3° swap | 4° swap | 5° swap | 6° swap | 7° swap | 8° swap | end |
0 | 12 | 5 | 5 | 5 | 5 | 5 | 1 | 1 | 1 | 1 |
1 | 5 | 12 | 8 | 8 | 8 | 1 | 5 | 5 | 5 | 5 |
2 | 8 | 8 | 12 | 12 | 1 | 8 | 8 | 8 | 8 | 6 |
3 | 33 | 33 | 33 | 1 | 12 | 12 | 12 | 12 | 6 | 8 |
4 | 1 | 1 | 1 | 33 | 33 | 33 | 33 | 6 | 12 | 12 |
5 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 33 | 33 | 33 |
First step: compare the first element with the second, if it's less than, swap both. Else, do nothing.
Second step: compare the second with the third, if the third is less than, swap. If the first is greater than, the element rises one more position. Else, it remains where it is. Be careful! The algorithm doesn't rises more than one element at a time. Once an element is rising, it goes on till a "ceiling" is hit. That is, an element smaller than it or the beginning of the sequence has been hit.
Another way to see this algorithm in action. Track it "upside down": consider the sequence 1 2 3 4 5. Pick up number one and "sink" it to the last position. We'll have 5 2 3 4 1. How many positions did the number "descend"? Four. Repeat with the number two, but don't sink it till the last, that is, descend it three positions.
Best case: the sequence is already sorted. Notice that the only way to know if the sequence is sorted or not is by scanning it. The algorithm is unable to identify whether some element is unsorted or not without scanning the whole sequence first.
Worst case: the sequence is reversed. There'll be (n - 1) + (n - 2) + ... 1 + 2 = n.(n - 1)/2 swap operations. Notice that when compared against select sort, bubble sort requires as many swap operations as select sort's comparison operations.
Average: the algorithm, regardless of the sequence being sorted or not, scans all elements, swapping an element in sequential positions till it finds its right place. The average case is equal to the worst.
Proportional time: in all three cases the time is proportional to [math]\displaystyle{ n^2 }[/math].