101 Logo
onenoughtone

Solution Approach

Solution Overview

There are 3 main approaches to solve this problem:

  1. Brute Force Approach - Complex approach
  2. Heap (Priority Queue) Approach - Complex approach
  3. Bucket Sort Approach - Complex approach

Approach 1: Brute Force Approach

Let's start by understanding the problem: we need to find the k most frequent words in an array, with a specific sorting order.

Thinking Process: The most straightforward approach is to count the frequency of each word, sort all words by their frequencies, and then return the top k words.

First, we use a hash map to count the frequency of each word in the array. Then, we create a list of all unique words and sort them according to the problem's requirements: by frequency in descending order, and if frequencies are equal, by alphabetical order. Finally, we return the first k words from the sorted list.

Intuition: This approach directly follows the problem statement. We count frequencies, sort based on those frequencies and alphabetical order, and then return the top k words.

Algorithm:

  1. Create a hash map to store the frequency of each word in the array.
  2. Iterate through the array and count the frequency of each word.
  3. Create a list of all unique words from the hash map.
  4. Sort the list of words based on their frequencies in descending order, and if frequencies are equal, by alphabetical order.
  5. Return the first k words from the sorted list.

Time Complexity:

O(n log n)

Where n is the number of unique words. Counting frequencies takes O(m) time where m is the length of the input array. Sorting takes O(n log n) time, and returning the top k words takes O(k) time. The overall time complexity is dominated by the sorting step, which is O(n log n).

Space Complexity:

O(n)

We use a hash map to store word frequencies, which takes O(n) space where n is the number of unique words. We also need O(n) space for the list of unique words. The overall space complexity is O(n).

Approach 2: Heap (Priority Queue) Approach

We can optimize the previous approach by using a heap (priority queue) to find the k most frequent words.

Thinking Process: Instead of sorting all words, we can use a min heap of size k to keep track of the k most frequent words. We first count the frequency of each word using a hash map. Then, we iterate through all unique words and maintain a min heap of size k, where words are ordered by their frequencies (and alphabetically if frequencies are equal).

For each word, we add it to the heap. If the heap size exceeds k, we remove the word with the lowest frequency (or highest alphabetical order if frequencies are equal). After processing all words, the heap contains the k most frequent words.

Intuition: A min heap of size k allows us to efficiently find the k largest elements in a collection. By maintaining a heap of size k, we avoid sorting all words, which is more efficient when k is much smaller than the number of unique words.

Algorithm:

  1. Create a hash map to store the frequency of each word in the array.
  2. Iterate through the array and count the frequency of each word.
  3. Create a min heap of size k, where words are ordered by their frequencies (and alphabetically if frequencies are equal).
  4. Iterate through all unique words:
  5. a. Add the word to the heap.
  6. b. If the heap size exceeds k, remove the word with the lowest frequency (or highest alphabetical order if frequencies are equal).
  7. Extract words from the heap in reverse order to get the k most frequent words sorted by frequency (and alphabetically if frequencies are equal).

Time Complexity:

O(n log k)

Where n is the number of unique words and k is the number of words to return. Counting frequencies takes O(m) time where m is the length of the input array. Adding each word to the heap takes O(log k) time, and we do this for n words, resulting in O(n log k) time. Extracting k words from the heap takes O(k log k) time. The overall time complexity is O(m + n log k + k log k), which simplifies to O(m + n log k) since k ≤ n.

Space Complexity:

O(n)

We use a hash map to store word frequencies, which takes O(n) space where n is the number of unique words. We also need O(k) space for the heap. The overall space complexity is O(n + k), which simplifies to O(n) since k ≤ n.

Approach 3: Bucket Sort Approach

Another approach is to use bucket sort to find the k most frequent words.

Thinking Process: Since the frequency of a word is at most the length of the input array, we can use an array of buckets where each bucket corresponds to a frequency. We place words with the same frequency in the same bucket, sorted alphabetically.

After placing all words in their respective buckets, we iterate through the buckets from highest frequency to lowest and collect the k most frequent words.

Intuition: Bucket sort is more efficient than general sorting algorithms when the range of values to be sorted is limited, which is the case here. By using buckets, we can avoid comparing frequencies during the sorting process.

Algorithm:

  1. Create a hash map to store the frequency of each word in the array.
  2. Iterate through the array and count the frequency of each word.
  3. Create an array of buckets where each bucket corresponds to a frequency.
  4. Place words with the same frequency in the same bucket, sorted alphabetically.
  5. Iterate through the buckets from highest frequency to lowest:
  6. a. For each bucket, add words to the result list in alphabetical order.
  7. b. Stop when we have collected k words.
  8. Return the result list.

Time Complexity:

O(m)

Where m is the length of the input array. Counting frequencies takes O(m) time. Creating buckets takes O(n) time where n is the number of unique words. Sorting words within each bucket takes O(n log n) time in the worst case, but since the number of words in each bucket is typically small, this is often closer to O(n). Collecting the k most frequent words takes O(k) time. The overall time complexity is O(m + n + n log n + k), which simplifies to O(m + n log n).

Space Complexity:

O(m)

We use a hash map to store word frequencies, which takes O(n) space where n is the number of unique words. We also need O(m) space for the buckets in the worst case. The overall space complexity is O(n + m), which simplifies to O(m) since n ≤ m.

Pseudocode

solution.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
function topKFrequent(words, k):
// Count frequency of each word
freq = new HashMap()
for each word in words:
freq[word] = freq[word] + 1 or 1 if word not in freq
// Create list of all unique words
uniqueWords = list of keys in freq
// Sort words by frequency (descending) and alphabetically (ascending)
sort uniqueWords by (-freq[word], word)
// Return top k words
return uniqueWords[0:k]
ProblemSolutionCode
101 Logo
onenoughtone