Heap Sort in Python

Heap Sort is a simple and efficient sorting algorithm based on the binary heap data structure. It is split into two significant stages: constructing a max heap (or min heap, in case of a reversal in the sorting order) and then repeatedly extracting the largest (or smallest, in case of reverse sort order) element from the heap to build up the sorted array.

Here’s how Heap Sort works, described step-by-step in Python:

Key Concepts

  1. Binary Heap:
  • A binary heap is a binary tree that has all the elements of the heap property satisfied.
    • Max Heap: Parent node should be greater or equal to that of children node
    • Min Heap: Parent should be lesser in value compared with children node 
  • Heap can typically be represented using an array that 
    • Left child of index i is at 2*i + 1.
    • Right child of index i is at 2*i + 2.
    • Parent of index i is at (i-1)//2.

2. Heapify Operation:

  • It maintains the heap property. This is done recursively by comparing a node with its children and swapping them if necessary.

3. Algorithm Workflow:

  • Build a Max Heap: Rearrange the input array into a max heap.
  • Elements Extraction: Move the root to the end of the array repeatedly and decrease the size of the heap, then re-heapify the rest of the elements.

Python Implementation

Here’s the code for Heap Sort with detailed comments:

def heapify(arr, n, i):
    """
    Ensures the subtree rooted at index 'i' follows the max heap property.
    :param arr: List of elements
    :param n: Size of the heap
    :param i: Index of the root node of the subtree
    """
    largest = i  # Initialize largest as root
    left = 2 * i + 1  # Left child
    right = 2 * i + 2  # Right child

    # If left child is larger than root
    if left < n and arr[left] > arr[largest]:
        largest = left

    # If right child is larger than largest so far
    if right < n and arr[right] > arr[largest]:
        largest = right

    # If largest is not root
    if largest != i:
        arr[i], arr[largest] = arr[largest], arr[i]  # Swap

        # Recursively heapify the affected subtree
        heapify(arr, n, largest)


def heap_sort(arr):
    """
    Perform Heap Sort on the given array.
    :param arr: List of elements to be sorted
    """
    n = len(arr)

    # Build a max heap
    for i in range(n // 2 - 1, -1, -1):
        heapify(arr, n, i)

    # Extract elements from the heap
    for i in range(n - 1, 0, -1):
        arr[i], arr[0] = arr[0], arr[i]  # Move current root to end
        heapify(arr, i, 0)  # Heapify the reduced heap


# Example Usage
if __name__ == "__main__":
    data = [12, 11, 13, 5, 6, 7]
    print("Original array:", data)
    heap_sort(data)
    print("Sorted array:", data)

Detailed Explanation of Code

  1. Building the Max Heap:
  • Starting from the last non-leaf node and heapify.
  • Nodes from index n//2 - 1 to 0 are non-leaf nodes.
  • This makes sure that the whole array obeys the max heap property.

2. Heapify Function:

  • Check if the left or right child is greater than the root.
  • If so, swap them and recursively call heapify on the affected subtree.

3. Sorting:

  • Swap the root (largest element) with the last element of the array.
  • Reduce the size of the heap by one.
  • Re-heapify the remaining heap to maintain the max heap property.

4. Time Complexity:

  • Building the heap: O(n).
  • Extracting elements: O(nlog⁡n).
  • Total: O(nlogn).

5. Space Complexity:

  • In-place algorithm: O(1) auxiliary space.

Output Example

Original array: [12, 11, 13, 5, 6, 7]
Sorted array: [5, 6, 7, 11, 12, 13]

This code sorts the array in ascending order. To sort in descending order, use a min heap instead of a max heap by adjusting the comparison operators in heapify.

Advantages and Disadvantages

Advantages:

  1. In-place sorting (requires no extra space).
  2. O(nlogn) worst-case time complexity.

Disadvantages:

  1. Not stable (relative order of equal elements may not be preserved).
  2. Performance is generally slower than Quick Sort for real-world inputs.

Heap Sort is a sorting algorithm that is both systematic and efficient, especially useful for datasets that don’t require stability and don’t have extra memory. It works well in scenarios like implementing priority queues or sorting large datasets in place.