128. Longest Consecutive Sequence

題目 / Problem

中文： 給你一個未排序的整數陣列 nums，請回傳「最長連續元素序列」的長度。所謂連續序列，是指像 [1, 2, 3, 4] 這樣每個數字都比前一個大 1 的一組數字（在原陣列中的順序不重要，重複的數字只算一次）。題目要求你的演算法時間複雜度必須是 O(n)。

English: Given an unsorted integer array nums, return the length of the longest run of consecutive integers. A consecutive sequence is a set of numbers like [1, 2, 3, 4] where each value is exactly 1 greater than the previous one (their order in the original array doesn't matter, and duplicates count only once). Your algorithm must run in O(n) time.

Constraints / 限制： - 0 <= nums.length <= 10^5 - -10^9 <= nums[i] <= 10^9

Worked example / 範例： - Input: nums = [100, 4, 200, 1, 3, 2] - Output: 4 - Explanation: The longest consecutive sequence is [1, 2, 3, 4], whose length is 4. The values 100 and 200 are isolated (length 1 each).

名詞解釋 / Glossary

hash set / 雜湊集合：A container that stores values with no duplicates and lets you ask "is value X in here?" in average O(1) (constant) time. 一種儲存不重複數值的容器，可以用平均 O(1)（常數）時間查詢「某個值在不在裡面」。
O(1) lookup / 常數時間查詢：Looking something up takes the same tiny amount of time no matter how many elements are stored. 不論裡面存了多少資料，查詢一筆所花的時間都差不多，與資料量無關。
amortized / 攤還分析：A way of averaging cost over many operations. Even if one step is occasionally slow, the total cost across all steps stays low. 把成本平均到許多次操作上看待；即使偶爾某一步較慢，所有步驟的「總和」成本仍然很低。
unordered_set (C++)：The C++ standard-library hash set. 「unordered」means it does not keep elements sorted, which is what makes lookups fast. C++ 標準函式庫的雜湊集合；「unordered」表示不維護排序，因此查詢很快。
sequence start / 序列起點：A number x is the start of a consecutive run only if x - 1 is NOT present. We only count length from these starts, which is the key trick. 一個數字 x 只有在 x - 1 不存在時，才算是某段連續序列的起點；我們只從起點開始數長度，這是本題的關鍵技巧。

思路

中文： 最直覺的暴力解是：對每個數字 x，去檢查 x+1、x+2、x+3……在不在陣列裡，能接多長就接多長。但「在不在陣列裡」如果用線性掃描，每次都要 O(n)，整體會變成 O(n²) 甚至更糟，題目明確要求 O(n)，所以行不通。另一個想法是先排序再掃一遍，排序後連續的數字就排在一起，這是 O(n log n)，雖然能過，但仍不符合 O(n) 的要求。

關鍵的優化分兩步。第一步：把所有數字丟進一個雜湊集合，這樣「某個數字在不在」就能用平均 O(1) 查到，同時順手去掉了重複值。第二步也是最重要的觀念——不要從每個數字都往上數，那會重複計算同一段序列。我們只從「序列的起點」開始數。怎麼判斷 x 是不是起點？很簡單：如果 x - 1 不在集合裡，那 x 前面接不上任何數，它就是某段連續序列的最左端。確認是起點後，才往上一直找 x+1, x+2, ...，數出這段的長度。

為什麼這樣總共只是 O(n)？因為每個數字最多只會被「往上數」碰到一次——它要嘛是某段序列的起點（觸發一次往上數），要嘛在某段往上數的過程中被經過一次，但絕不會成為另一段的起點（因為它的前一個數存在）。所以內層的 while 迴圈在整個執行過程中累計最多跑 n 次，加上外層掃一遍 n 次，總共仍是線性。

English: The brute-force idea is: for each number x, walk upward checking x+1, x+2, … and count how far the chain extends. But if "is this value present?" is answered by scanning the array, each check costs O(n) and the whole thing becomes O(n²). The problem explicitly demands O(n), so that's out. Sorting first and scanning once works in O(n log n), which would pass but still misses the O(n) target.

The optimization has two parts. First, dump every number into a hash set so "is value present?" becomes an average O(1) query — this also drops duplicates for free. Second, and most important: don't count upward from every number, because that re-walks the same chain over and over. Instead, only start counting from the beginning of a run. How do you know x is a beginning? If x - 1 is not in the set, nothing can connect below x, so it is the leftmost element of its run. Only then do we walk x+1, x+2, … to measure that run's length.

Why is the total still O(n)? Because each number gets "walked upward over" at most once: a number is either the start of a run (it triggers one upward walk) or it gets passed through during some other run's walk — but it can never itself be a start (its predecessor exists). So the inner while-loop runs at most n times in total across the entire execution. Combined with the single outer pass of n, the work stays linear.

逐步走查 / Walkthrough

Example input / 範例輸入：nums = [100, 4, 200, 1, 3, 2]

Step 0 — build the set / 建立集合： set = {100, 4, 200, 1, 3, 2}. longest = 0.

Now iterate over each number and ask: "is x - 1 in the set?" If yes, skip (not a start). If no, walk upward.

x (current)	`x-1` in set?	Is start? / 是起點？	Upward walk / 往上數	This run's length / 本段長度	longest
100	`99`? No	✅ yes	`101`? No → stop	1	1
4	`3`? Yes	❌ no (skip)	—	—	1
200	`199`? No	✅ yes	`201`? No → stop	1	1
1	`0`? No	✅ yes	`2`✓ `3`✓ `4`✓ `5`? No → stop	4	4
3	`2`? Yes	❌ no (skip)	—	—	4
2	`1`? Yes	❌ no (skip)	—	—	4

Result / 結果： longest = 4. Notice that only 1 (the true start of 1,2,3,4) triggered the long walk; 2, 3, 4 were all skipped as non-starts, which is exactly why we avoid redundant work. 注意只有 1（序列真正的起點）觸發了長的往上數；2、3、4 都因為「不是起點」被跳過，這正是我們避免重複計算的原因。

Solution — C

// 演算法：把所有數字放入雜湊集合，只從「沒有前一個數 (x-1)」的起點開始往上數，
// 累計每段連續序列的長度，取最大值。整體 O(n)。
// Algorithm: put all numbers in a hash set, then from each "start" (a value whose
// x-1 is absent) walk upward counting the run length; track the maximum. Overall O(n).

#include <stdlib.h>   // 提供 malloc / free / qsort 等 / provides malloc, free, etc.

// 為了在 C 裡有一個 O(1) 的「集合」，我們用開放定址雜湊表自己實作。
// To get an O(1) "set" in C, we build an open-addressing hash table ourselves.

#define EMPTY_SLOT 0x80000000  // 用一個不可能的整數標記空格 (此值超出題目範圍)
                               // A sentinel marking an empty slot (outside the input range).

// 把任意整數映射到表格索引 / map any integer to a table index.
static int hashIndex(long long key, int cap) {
    // key 可能是負數，先加上 cap 再取餘數，確保結果非負 / handle negatives: result stays in [0, cap).
    long long h = key % cap;
    if (h < 0) h += cap;
    return (int)h;
}

int longestConsecutive(int* nums, int numsSize) {
    if (numsSize == 0) return 0;  // 空陣列直接回傳 0 / empty input → 0.

    // 容量取 2 倍以上以降低碰撞；+1 避免容量為 0 / size table > 2x to keep collisions low.
    int cap = numsSize * 2 + 1;

    // 配置表格，每格存一個 int。malloc 回傳一塊未初始化記憶體的指標。
    // Allocate the table, one int per slot. malloc returns a pointer to raw memory.
    int* table = (int*)malloc(sizeof(int) * cap);

    // 把每一格初始化為 EMPTY_SLOT，表示目前是空的 / mark every slot empty first.
    for (int i = 0; i < cap; i++) table[i] = EMPTY_SLOT;

    // ---- 插入階段：把所有數字放進雜湊表，自動忽略重複 ----
    // ---- Insert phase: put every number into the table (duplicates harmlessly overwrite). ----
    for (int i = 0; i < numsSize; i++) {
        int v = nums[i];                 // 目前要插入的值 / the value to insert.
        int idx = hashIndex(v, cap);     // 算出它的起始格 / its starting slot.
        // 線性探測：若該格被別的值佔用，就往後一格找空位 (開放定址法)。
        // Linear probing: if the slot is taken by a different value, step to the next slot.
        while (table[idx] != EMPTY_SLOT && table[idx] != v) {
            idx = (idx + 1) % cap;       // % cap 讓索引繞回開頭 / wrap around to the front.
        }
        table[idx] = v;                  // 寫入值 (若已存在則原地覆寫，等於去重) / store (dups overwrite).
    }

    // ---- 查詢函式的內嵌寫法：用 lambda 在 C 不行，所以直接展開為一段探測迴圈 ----
    // In C there are no closures, so we inline the "contains" search where needed below.

    int longest = 1;  // 至少有一個元素，所以最短答案是 1 / at least one element exists → min 1.

    // ---- 主迴圈：對每個數字判斷它是不是序列起點 ----
    // ---- Main loop: for each number decide whether it starts a run. ----
    for (int i = 0; i < numsSize; i++) {
        int x = nums[i];

        // 檢查 x-1 是否存在：若存在，x 就不是起點，跳過以避免重複計算。
        // Check whether x-1 exists; if so, x is not a start, so skip (avoids redundant walks).
        long long prev = (long long)x - 1;   // 用 long long 防止 x = -2^31 時減 1 溢位 / avoid underflow.
        int pidx = hashIndex(prev, cap);
        int prevExists = 0;
        while (table[pidx] != EMPTY_SLOT) {  // 探測直到遇到空格才確定不存在 / probe until an empty slot.
            if (table[pidx] == (int)prev) { prevExists = 1; break; }
            pidx = (pidx + 1) % cap;
        }
        if (prevExists) continue;            // x-1 在 → x 不是起點 / x-1 present → not a start.

        // 到這裡 x 是某段序列的起點，開始往上數 x+1, x+2, ...
        // x is a run start: walk upward counting x+1, x+2, ...
        int curNum = x;       // 目前數到的數字 / the number we have reached.
        int curLen = 1;       // 這段的長度，從 1 起算 / current run length, starting at 1.

        while (1) {           // 無限迴圈，內部用 break 跳出 / loop until next number is absent.
            long long next = (long long)curNum + 1;  // 下一個要找的數 / the next value to look for.
            int nidx = hashIndex(next, cap);
            int found = 0;
            while (table[nidx] != EMPTY_SLOT) {       // 在表中搜尋 next / search for next in the table.
                if (table[nidx] == (int)next) { found = 1; break; }
                nidx = (nidx + 1) % cap;
            }
            if (!found) break;                        // 找不到下一個 → 這段結束 / chain ends.
            curNum = (int)next;                       // 接上，往上移動 / extend the chain.
            curLen++;                                 // 長度加一 / length grows by one.
        }

        if (curLen > longest) longest = curLen;       // 更新最長紀錄 / update the maximum.
    }

    free(table);   // 釋放 malloc 配置的記憶體，避免記憶體洩漏 / free the table to avoid a memory leak.
    return longest;
}

Solution — C++

// 演算法：把所有數字放進 unordered_set (雜湊集合)，只從「x-1 不存在」的起點往上數，
// 累計每段連續序列長度並取最大。整體 O(n)。
// Algorithm: load all numbers into an unordered_set, walk upward only from run-starts
// (values whose x-1 is absent), track the longest run. Overall O(n).

#include <vector>          // std::vector 動態陣列 / dynamic array.
#include <unordered_set>   // std::unordered_set 雜湊集合，O(1) 平均查詢 / hash set, O(1) average lookup.
#include <algorithm>       // std::max 取較大值 / for std::max.
using namespace std;

class Solution {
public:
    int longestConsecutive(vector<int>& nums) {
        // 用 nums 的頭尾迭代器一次建好集合；重複值會自動被忽略。
        // Build the set directly from nums's begin/end iterators; duplicates are dropped automatically.
        unordered_set<int> s(nums.begin(), nums.end());

        int longest = 0;  // 答案；空陣列時保持 0 / answer; stays 0 for empty input.

        // range-for：依序取出集合中每個元素 x (此處用 const 參考避免複製)。
        // range-for: iterate over each element x in the set (const reference avoids copying).
        for (const int x : s) {
            // s.count(x-1) 回傳 0 或 1，代表 x-1 在不在集合裡。
            // s.count(x-1) returns 0 or 1: whether x-1 is present.
            // 若 x-1 存在，x 不是起點，跳過 / if x-1 exists, x is not a start, skip.
            if (s.count(x - 1)) continue;

            // x 是序列起點，開始往上數 / x starts a run; walk upward.
            int curNum = x;     // 目前到達的數字 / current value reached.
            int curLen = 1;     // 本段長度 / current run length.

            // 只要下一個數還在集合裡就繼續延伸 / keep extending while the next value exists.
            while (s.count(curNum + 1)) {
                curNum++;       // 移到下一個連續數 / move to the next consecutive number.
                curLen++;       // 長度加一 / grow the length.
            }

            longest = max(longest, curLen);  // 更新最長紀錄 / keep the maximum.
        }

        return longest;
    }
};

複雜度 / Complexity

Time: O(n) — 建立集合花 O(n)。主迴圈看似有兩層 (外層 for + 內層 while)，但內層 while 只在「起點」才會延伸，而每個數字在整個過程中最多被往上數經過一次，所以內層 while 累計總共最多跑 n 次。因此總時間是線性的，n 指陣列元素個數。 Building the set is O(n). The main loop looks nested, but the inner while only walks from run-starts, and every value is walked over at most once across the whole run — so the inner loop runs at most n times in total. The result is linear, where n is the number of elements.
Space: O(n) — 雜湊集合最多存 n 個不重複的數字。 The hash set holds up to n distinct numbers.

Pitfalls & Edge Cases

空陣列 / Empty input (numsSize == 0)：必須先處理，否則 C 版會配置容量並回傳錯誤值，C++ 版的 longest 初值設為 0 即可正確回傳。Handle it first; the C version guards with an early return 0, and C++ correctly returns its initial longest = 0.
不要從每個數都往上數 / Don't walk upward from every number：若省略 if (s.count(x-1)) continue; 這個起點檢查，演算法仍然正確，但會對同一段序列重複往上數，退化成 O(n²)，無法通過 O(n) 要求。Skipping the start-check still gives the right answer but re-walks each run repeatedly, degrading to O(n²).
重複值 / Duplicates（例如 [1,0,1,2]）：用集合天然去重，所以 1 只算一次，答案是 3 而非更多。The set drops duplicates automatically, so repeated values don't inflate the count.
整數溢位 / Integer overflow：值域到 ±10^9，而 x 本身在 int 範圍內；C 版在計算 x-1 與 x+1 時用 long long 暫存，避免極端值 (如 INT_MIN) 減 1 溢位。Computing x-1/x+1 near INT_MIN/INT_MAX can overflow int; the C code casts to long long for safety. (C++ 版 curNum+1 最大只到約 10^9，在 int 範圍內安全。)
回傳值混淆 / Return-value confusion：回傳的是「序列長度」不是「最大數字」或「序列本身」。Return the length of the run, not the largest element or the sequence itself.
C 雜湊表容量為 0 / Zero-capacity table in C：cap = numsSize*2 + 1 的 +1 確保即使極小輸入容量也 ≥ 1，避免對 0 取餘數 (除以零) 造成未定義行為。The +1 guarantees a non-zero modulus, preventing division-by-zero.