Description

Editorial

Design Multi-threaded Merge Sort

Intermediate

Design a multi-threaded merge sort with multiple parallelism strategies, benchmarked against a sequential baseline.

Implement four variants: (1) Sequential Merge Sort — standard divide-and-conquer with insertion sort cutoff at n ≤ 32 for cache efficiency; (2) Parallel Thread-per-Split — fork a new thread for the left half, compute right half in current thread, join; depth-limited to max_depth levels so at most 2^d threads are spawned; (3) Thread Pool Merge Sort — submit tasks to a fixed ThreadPoolExecutor with a sequential threshold (e.g., 10,000 elements); below threshold, sort sequentially to avoid task overhead; (4) Fork-Join Merge Sort — SortTask modelled as RecursiveTask: fork() left, compute() right, join() left, inspired by Java's ForkJoinPool with work-stealing. All implementations share a thread-safe SortMetrics class that atomically tracks comparisons, threads used, and recursion depth.

Core Use Cases

Sequential merge sort (baseline)

Standard single-threaded divide-and-conquer merge sort with insertion sort cutoff at n ≤ 32 for cache efficiency

Parallel thread-per-split

At each divide step, fork a new thread for the left half and compute the right half in the current thread; depth-limited to avoid thread explosion (2^d threads at depth d)

Thread pool merge sort

Submit sort tasks to a fixed-size ThreadPoolExecutor; avoids thread creation/destruction overhead; sequential fallback below threshold

Fork-Join merge sort

SortTask as RecursiveTask: fork() left subtask, compute() right in current thread, join() left; Java ForkJoinPool-inspired with work stealing

Depth-limited parallelism

Only fork new threads/tasks for the top max_depth levels of recursion; below that, fall back to sequential sort to avoid task overhead

Sequential threshold cutoff

Below a configurable threshold (e.g., 10,000 elements), sort sequentially to avoid task submission overhead exceeding parallel benefit

Insertion sort for small subarrays

Switch to insertion sort for n ≤ 32; better cache locality and constant-factor performance for small arrays

Thread-safe metrics collection

Atomic counters track comparisons, threads used, and max recursion depth across concurrent tasks

Benchmarking and correctness verification

Compare sequential vs parallel implementations on various array sizes; verify sorted output matches expected

Constraints

Max thread count is bounded: at most 2^max_depth threads are forked
Thread-per-split model: left half forked, right half computed in current thread (fork-join pattern)
Sequential threshold prevents submitting tiny tasks to the thread pool (diminishing returns)
Merge step is always sequential (merging two sorted halves is inherently serial)
Metrics collection must be thread-safe (atomic or lock-protected counters)
Insertion sort cutoff at n ≤ 32 for improved constant factors

Assumptions

Input is an array of integers (or comparable elements)
Sufficient memory for auxiliary merge arrays (not in-place)
Thread creation/pool overhead is amortized over large arrays
OS thread scheduler provides reasonable fairness
TypeScript uses async/await (single-threaded JS) — demonstrates pattern, not true parallelism

In Scope

Sequential merge sort with insertion sort cutoff
Parallel merge sort: thread-per-split with depth limit
Thread pool merge sort with sequential threshold
Fork-Join merge sort (RecursiveTask / SortTask pattern)
Configurable max_depth, pool_size, and sequential threshold
Thread-safe performance metrics (comparisons, threads, depth)
Benchmarking harness comparing all implementations
Correctness verification against standard sort

Out of Scope

In-place merge sort (requires complex merge logic)
Parallel merge step (merging in parallel is possible but complex)
GPU-based sorting (CUDA/OpenCL)
External merge sort (disk-based for data larger than memory)
Non-comparison sorts (radix, counting)
Adaptive sorting (Timsort)
Distributed sorting (across machines)

Approach Guide(Click to expand each section)

Before diving into code, clarify the use cases and edge cases. Understanding the problem deeply leads to better class design.

List Core Use Cases

Identify the primary actions users will perform. For a parking lot: park vehicle, exit vehicle, check availability. Each becomes a method.

Identify Actors

Who interacts with the system? Customers, admins, automated systems? Each actor type may need different interfaces.

Clarify Constraints

What are the limits? Max vehicles, supported vehicle types, payment methods. Constraints drive your data structures.

Ask About Edge Cases

What happens on overflow? Concurrent access? Payment failures? Thinking about edge cases reveals hidden complexity.

Follow-up Questions(Questions an interviewer might ask)

UML

Code

Submissions

Solutions

💡 Draw UML class diagrams, sequence diagrams, or any design visualizations. Submit from the Code tab.

Loading canvas...

Loading LLD design...

Description

Editorial

Design Multi-threaded Merge Sort

Intermediate

Design a multi-threaded merge sort with multiple parallelism strategies, benchmarked against a sequential baseline.

Core Use Cases

Sequential merge sort (baseline)

Standard single-threaded divide-and-conquer merge sort with insertion sort cutoff at n ≤ 32 for cache efficiency

Parallel thread-per-split

At each divide step, fork a new thread for the left half and compute the right half in the current thread; depth-limited to avoid thread explosion (2^d threads at depth d)

Thread pool merge sort

Submit sort tasks to a fixed-size ThreadPoolExecutor; avoids thread creation/destruction overhead; sequential fallback below threshold

Fork-Join merge sort

SortTask as RecursiveTask: fork() left subtask, compute() right in current thread, join() left; Java ForkJoinPool-inspired with work stealing

Depth-limited parallelism

Only fork new threads/tasks for the top max_depth levels of recursion; below that, fall back to sequential sort to avoid task overhead

Sequential threshold cutoff

Below a configurable threshold (e.g., 10,000 elements), sort sequentially to avoid task submission overhead exceeding parallel benefit

Insertion sort for small subarrays

Switch to insertion sort for n ≤ 32; better cache locality and constant-factor performance for small arrays

Thread-safe metrics collection

Atomic counters track comparisons, threads used, and max recursion depth across concurrent tasks

Benchmarking and correctness verification

Compare sequential vs parallel implementations on various array sizes; verify sorted output matches expected

Constraints

Max thread count is bounded: at most 2^max_depth threads are forked
Thread-per-split model: left half forked, right half computed in current thread (fork-join pattern)
Sequential threshold prevents submitting tiny tasks to the thread pool (diminishing returns)
Merge step is always sequential (merging two sorted halves is inherently serial)
Metrics collection must be thread-safe (atomic or lock-protected counters)
Insertion sort cutoff at n ≤ 32 for improved constant factors

Assumptions

Input is an array of integers (or comparable elements)
Sufficient memory for auxiliary merge arrays (not in-place)
Thread creation/pool overhead is amortized over large arrays
OS thread scheduler provides reasonable fairness
TypeScript uses async/await (single-threaded JS) — demonstrates pattern, not true parallelism

In Scope

Sequential merge sort with insertion sort cutoff
Parallel merge sort: thread-per-split with depth limit
Thread pool merge sort with sequential threshold
Fork-Join merge sort (RecursiveTask / SortTask pattern)
Configurable max_depth, pool_size, and sequential threshold
Thread-safe performance metrics (comparisons, threads, depth)
Benchmarking harness comparing all implementations
Correctness verification against standard sort

Out of Scope

In-place merge sort (requires complex merge logic)
Parallel merge step (merging in parallel is possible but complex)
GPU-based sorting (CUDA/OpenCL)
External merge sort (disk-based for data larger than memory)
Non-comparison sorts (radix, counting)
Adaptive sorting (Timsort)
Distributed sorting (across machines)

Approach Guide(Click to expand each section)

Before diving into code, clarify the use cases and edge cases. Understanding the problem deeply leads to better class design.

List Core Use Cases

Identify the primary actions users will perform. For a parking lot: park vehicle, exit vehicle, check availability. Each becomes a method.

Identify Actors

Who interacts with the system? Customers, admins, automated systems? Each actor type may need different interfaces.

Clarify Constraints

What are the limits? Max vehicles, supported vehicle types, payment methods. Constraints drive your data structures.

Ask About Edge Cases

What happens on overflow? Concurrent access? Payment failures? Thinking about edge cases reveals hidden complexity.

Follow-up Questions(Questions an interviewer might ask)

UML

Code

Submissions

Solutions

💡 Draw UML class diagrams, sequence diagrams, or any design visualizations. Submit from the Code tab.

Loading canvas...

Design Multi-threaded Merge Sort

Core Use Cases

Constraints

Assumptions

In Scope

Out of Scope

Approach Guide(Click to expand each section)

Gather Requirements~3 min

Identify Classes & Entities~4 min

Define Relationships~3 min

Apply Design Patterns~4 min

Apply SOLID Principles~3 min

Code Organization~2 min

Follow-up Questions(Questions an interviewer might ask)

1How does Java's ForkJoinPool achieve work stealing, and why is it better than a fixed thread pool for recursive tasks?

2How would you parallelise the merge step itself?

3What determines the optimal sequential threshold?

4How does this compare to parallel quicksort?

5How would you implement this using POSIX threads or std::async in production?

Key Topics

Asked At

Design Multi-threaded Merge Sort

Core Use Cases

Constraints

Assumptions

In Scope

Out of Scope

Approach Guide(Click to expand each section)

Gather Requirements~3 min

Identify Classes & Entities~4 min

Define Relationships~3 min

Apply Design Patterns~4 min

Apply SOLID Principles~3 min

Code Organization~2 min

Follow-up Questions(Questions an interviewer might ask)

1How does Java's ForkJoinPool achieve work stealing, and why is it better than a fixed thread pool for recursive tasks?

2How would you parallelise the merge step itself?

3What determines the optimal sequential threshold?

4How does this compare to parallel quicksort?

5How would you implement this using POSIX threads or std::async in production?

Key Topics

Asked At