274. H-Index

題目 / Problem

中文： 給定一個整數陣列 citations，其中 citations[i] 表示研究者第 i 篇論文被引用的次數。請回傳該研究者的 h 指數 (h-index)。

根據 h-index 的定義：h 指數是滿足「該研究者至少有 h 篇論文，每篇至少被引用 h 次」的最大整數 h。

English: Given an integer array citations where citations[i] is the number of citations of the researcher's i-th paper, return the researcher's h-index.

By definition: the h-index is the maximum value h such that the researcher has at least h papers each cited at least h times.

Constraints / 限制： - 1 <= n <= 5000 - 0 <= citations[i] <= 1000

Example / 範例：

Input:  citations = [3, 0, 6, 1, 5]
Output: 3

解釋：有 3 篇論文被引用次數 ≥ 3（即 3, 6, 5），剩下兩篇 ≤ 3，所以 h = 3。 Explanation: 3 papers each have at least 3 citations (3, 6, 5); the other 2 have at most 3, so h = 3.

名詞解釋 / Glossary

h 指數 / h-index：最大的 h 使得「至少 h 篇論文，每篇至少被引用 h 次」。注意 h 不能超過總論文數 n。/ The largest h for which at least h papers each have ≥ h citations. Capped by total paper count n.
計數排序 / Counting Sort (Bucket)：當數值範圍有限時，用一個額外陣列當「桶子」紀錄每個數值出現幾次，避免比較式排序的 O(n log n) 成本。/ When values fall in a small range, count occurrences in a bucket array — this skips comparison-based sorting's O(n log n) cost.
降序排序 / Descending Sort：由大到小排列。排序後 citations[i] 是第 i+1 高的引用數。/ Sort largest to smallest; afterwards citations[i] is the (i+1)-th highest citation count.
qsort：C 標準庫的排序函式，需傳入一個比較函式指標。/ C standard library sort; takes a comparator function pointer.
指標 / Pointer (int*)：儲存記憶體位址的變數。*p 取出該位址的內容，p[i] 等同於 *(p+i)。/ A variable holding a memory address. *p reads the value there; p[i] is shorthand for *(p+i).
後綴和直覺 / Suffix-count intuition：若把引用數從大到小排序，前 i 篇都至少有 citations[i-1] 次引用，這是判斷 h-index 的核心。/ Sorting descending makes "the top i papers each have ≥ citations[i-1] citations" trivially true, which is exactly what h-index needs.

思路

最直觀的做法是「列舉所有可能的 h 值」：對每個候選 h 從 0 到 n，掃過陣列數有幾篇論文 ≥ h，再判斷是否 ≥ h。這是 O(n²)，可行但慢。更好的觀察是：把 citations 由大到小排序之後，第 i（從 1 開始）篇論文的引用數就是 citations[i-1]，而前 i 篇論文每篇都至少有 citations[i-1] 次引用（因為已排序）。所以只要找到最大的 i 使得 citations[i-1] >= i，這個 i 就是答案。這種做法是 O(n log n)，主要花費在排序。

更快的做法用 計數排序（桶子）：注意到 h-index 不可能超過 n，所以引用數 ≥ n 的論文都可以併進同一個桶 bucket[n]。建一個長度 n+1 的桶陣列，掃一遍 citations 把每篇論文丟進對應桶（超過 n 的塞進 bucket[n]）。然後從 h = n 往 0 遞減，累計目前為止「引用數 ≥ h」的論文數 count；第一個讓 count >= h 的 h 就是答案。這是純 O(n) 時間、O(n) 空間。

The brute-force idea is to try every candidate h from 0 to n and count how many papers meet it — O(n²). A better observation: if we sort citations in descending order, then for each rank i (1-indexed), the top i papers all have at least citations[i-1] citations (because the array is sorted). So we want the largest i with citations[i-1] >= i. That gives O(n log n), dominated by the sort.

We can do even better with counting sort / bucketing. Two key facts: the answer can never exceed n, and any paper with more than n citations contributes the same as one with exactly n citations for h-index purposes. So allocate a bucket array of size n+1, and for each paper drop it into bucket[min(c, n)]. Then walk h from n down to 0, accumulating count — the number of papers with ≥ h citations. The first h where count >= h is the answer. This runs in O(n) time and O(n) extra space.

逐步走查 / Walkthrough

Input: citations = [3, 0, 6, 1, 5], so n = 5.

Step 1 — Build buckets / 建立桶子 (size n+1 = 6, indices 0..5; values >5 collapse to bucket 5):

Paper citations	Bucket index (`min(c, 5)`)
3	3
0	0
6	5 (capped)
1	1
5	5

Final bucket = [1, 1, 0, 1, 0, 2].

Step 2 — Scan h from n down to 0, accumulate count (papers with citations ≥ h):

h	bucket[h]	count (running total)	count >= h ?	Decision
5	2	2	2 >= 5? No	continue
4	0	2	2 >= 4? No	continue
3	1	3	3 >= 3? Yes	return 3 ✅

Answer / 答案：h = 3.

Solution — C

// Algorithm / 演算法：counting-sort bucket. Cap each citation at n, drop into a bucket,
// then sweep h from n down to 0 accumulating "papers with ≥ h citations". The first
// h where that running count >= h is the h-index. O(n) time, O(n) space.

#include <stdlib.h>   // malloc, calloc, free / 動態記憶體配置函式
#include <string.h>   // (not strictly needed, kept for portability / 可移植性保留)

int hIndex(int* citations, int citationsSize) {
    int n = citationsSize;                       // n = 論文總數 / total number of papers

    // 配置長度 n+1 的桶子陣列，calloc 會把每格初始化為 0
    // Allocate a bucket array of length n+1; calloc zero-initializes every slot.
    int* bucket = (int*)calloc(n + 1, sizeof(int));

    // 第一輪：把每篇論文丟進對應桶子 / First pass: drop each paper into its bucket
    for (int i = 0; i < n; i++) {
        int c = citations[i];                    // 取出第 i 篇的引用數 / read i-th citation count
        // 引用數 > n 的論文一律塞進 bucket[n]，因為 h 不可能超過 n
        // Anything > n collapses into bucket[n], since h cannot exceed n.
        if (c >= n) {
            bucket[n]++;                         // 大於等於 n 的篇數 +1 / count of papers with ≥ n citations
        } else {
            bucket[c]++;                         // 否則丟進對應引用數的桶子 / otherwise, exact citation bucket
        }
    }

    // 第二輪：從 h = n 往下掃，count 累計「引用數 >= h 的論文數」
    // Second pass: sweep h from n down, accumulating papers with ≥ h citations.
    int count = 0;                               // 後綴累計 / running suffix count
    for (int h = n; h >= 0; h--) {
        count += bucket[h];                      // 加上恰好 h 次引用的論文 / add papers with exactly h citations
        // 第一個滿足 count >= h 的 h 就是答案（h 從大到小掃，自然找到最大值）
        // First h with count >= h is the answer (we scan top-down, so it's the maximum).
        if (count >= h) {
            free(bucket);                        // 釋放桶子 / free the heap allocation
            return h;                            // 回傳答案 / return result
        }
    }

    // 理論上 h = 0 一定成立，迴圈內就會回傳；這行只是給編譯器看
    // h = 0 always satisfies count >= 0, so the loop always returns; this is just a fallback.
    free(bucket);
    return 0;
}

Solution — C++

// Algorithm / 演算法: same counting-sort bucket as the C version. Cap citations at n,
// bucket them, sweep h from n downward keeping a running count of "papers with ≥ h
// citations", and return the first h with count >= h. O(n) time, O(n) space.

#include <vector>          // std::vector — 動態陣列 / dynamic array container
#include <algorithm>       // std::min — 取最小值 / utility function

class Solution {
public:
    int hIndex(std::vector<int>& citations) {
        int n = static_cast<int>(citations.size());   // n 是論文總數 / total paper count

        // std::vector<int> 預設初始化為 0；長度 n+1 對應 h 可能值 0..n
        // std::vector<int> default-initializes elements to 0; length n+1 covers h = 0..n.
        std::vector<int> bucket(n + 1, 0);

        // range-for 是 C++11 的語法糖：for (auto x : container) 逐一拿出每個元素
        // Range-for is C++11 sugar: iterates each element, here as `int c`.
        for (int c : citations) {
            // std::min(c, n) 把超過 n 的引用數壓回 n，因為 h 不會超過 n
            // std::min(c, n) caps citations at n — h-index can't exceed n anyway.
            bucket[std::min(c, n)]++;                 // 對應桶子 +1 / increment matching bucket
        }

        int count = 0;                                // 累計後綴計數 / running suffix count
        // 從大的 h 開始掃，第一個成立的就是最大的 h
        // Sweep from large h downward; first hit is automatically the maximum.
        for (int h = n; h >= 0; --h) {
            count += bucket[h];                       // 把這一桶加進累計 / add this bucket's papers
            if (count >= h) {                         // 至少 h 篇論文，每篇 ≥ h 次引用 / definition satisfied
                return h;                             // 即是 h-index / this is the answer
            }
        }

        return 0;                                     // 不會執行到，h=0 必成立 / unreachable; h=0 always works
    }
};

複雜度 / Complexity

Time: O(n) — 兩次線性掃描：一次填桶子，一次從 n 往下累計。沒有排序，不需要 O(n log n)。 / Two linear passes — one to fill buckets, one to scan downward. No sort needed.
Space: O(n) — 桶子陣列長度為 n + 1。輸入陣列本身不算額外空間。 / The bucket array has length n + 1; input itself doesn't count as extra space.

Pitfalls & Edge Cases

h 不能超過 n / h cannot exceed n：即使有論文被引用一百萬次，h 最多就是 n。所以可以安全地用 min(c, n) 把大值壓回。/ Even with a million citations, h is bounded by n. Capping with min(c, n) is safe and necessary for the bucket size.
全部 0 的情況 / all zeros：例如 [0, 0, 0]。bucket[0] = 3，但 h = 3, 2, 1 時 count 都是 0；h = 0 時 count = 3 >= 0 成立，回傳 0。/ E.g. [0,0,0]: only h = 0 satisfies the definition, and the loop reaches it correctly.
單一元素 / single element：例如 [100]，n = 1，會被壓進 bucket[1]，h = 1 時 count = 1 >= 1 成立，回傳 1。/ [100] with n = 1 collapses into bucket[1]; h = 1 succeeds, returning 1.
由大到小掃的重要性 / why scan top-down：必須從 h = n 往下，第一個滿足的就是最大值。從小往大掃會錯過更大的解。/ Scanning from n downward guarantees we find the maximum h first; bottom-up would find a valid but non-maximum value.
C 中要 free / remember free in C：calloc 配置的記憶體必須釋放，否則記憶體洩漏。提早 return 的分支也要先 free。/ Heap memory from calloc must be released on every return path to avoid leaks.
不要漏掉 bucket[n] / don't forget bucket[n]：桶子大小是 n + 1（索引 0..n），不是 n。少一格會在 c >= n 時越界寫入。/ Bucket size is n + 1, not n — off-by-one here causes out-of-bounds writes for capped values.