difficulty: Hard tags:

  • Heap
  • Data Structure Design
  • Hash Table title: Top K Frequent Words II

Top K Frequent Words II

Problem

Metadata

Description

Find top k frequent words in realtime data stream.

Implement three methods for Topk Class:

  1. TopK(k). The constructor.
  2. add(word). Add a new word.
  3. topk(). Get the current top k frequent words.

Notice

If two words have the same frequency, rank them by alphabet.

Example

  1. TopK(2)
  2. add("lint")
  3. add("code")
  4. add("code")
  5. topk()
  6. >> ["code", "lint"]

题解

此题较难,实际上和 Redis 的有序集合类似,综合使用字典和排序集合可完美解决。

Java

  1. public class TopK {
  2. private int k;
  3. private Map<String, Integer> wordFreq = null;
  4. private TreeSet<String> topkSet = null;
  5. class TopkComparator implements Comparator<String> {
  6. public int compare(String s1, String s2) {
  7. int s1Freq = wordFreq.get(s1), s2Freq = wordFreq.get(s2);
  8. if (s1Freq != s2Freq) {
  9. return s2Freq - s1Freq;
  10. } else {
  11. return s1.compareTo(s2);
  12. }
  13. }
  14. }
  15. /*
  16. * @param k: An integer
  17. */public TopK(int k) {
  18. // do intialization if necessary
  19. this.k = k;
  20. wordFreq = new HashMap<String, Integer>(k);
  21. topkSet = new TreeSet<String>(new TopkComparator());
  22. }
  23. /*
  24. * @param word: A string
  25. * @return: nothing
  26. */
  27. public void add(String word) {
  28. // write your code here
  29. if (wordFreq.containsKey(word)) {
  30. if (topkSet.contains(word)) {
  31. topkSet.remove(word);
  32. }
  33. wordFreq.put(word, wordFreq.get(word) + 1);
  34. } else {
  35. wordFreq.put(word, 1);
  36. }
  37. topkSet.add(word);
  38. if (topkSet.size() > k) {
  39. topkSet.pollLast();
  40. }
  41. }
  42. /*
  43. * @return: the current top k frequent words.
  44. */
  45. public List<String> topk() {
  46. // write your code here
  47. List<String> result = new ArrayList<String>(k);
  48. Iterator<String> it = topkSet.iterator();
  49. while (it.hasNext()) {
  50. result.add(it.next());
  51. }
  52. return result;
  53. }
  54. }

源码分析

复杂度分析

待续