3043. Find the Length of the Longest Common Prefix

題目 / Problem

中文：給你兩個正整數陣列 arr1 與 arr2。一個正整數的「前綴」是由它最左邊若干位數字組成的整數（例如 123 是 12345 的前綴）。對於每一對 (x, y)（其中 x ∈ arr1、y ∈ arr2），找出 x 和 y 的最長共同前綴；最終回傳所有配對中最長的那個共同前綴的長度（位數）。若沒有任何共同前綴，回傳 0。

English: You are given two arrays of positive integers arr1 and arr2. A prefix of a positive integer consists of one or more of its leftmost digits (e.g. 123 is a prefix of 12345). For every pair (x, y) with x ∈ arr1 and y ∈ arr2, find the longest common prefix; return the length (number of digits) of the longest such prefix across all pairs. Return 0 if no pair shares any prefix.

Constraints: - 1 <= arr1.length, arr2.length <= 5 * 10^4 - 1 <= arr1[i], arr2[i] <= 10^8 （所以每個數最多 9 位 / so each integer has at most 9 digits）

Example: arr1 = [1, 10, 100], arr2 = [1000] → output 3. The pair (100, 1000) shares the prefix 100, which has 3 digits.

名詞解釋 / Glossary

Prefix / 前綴：從一個整數最左邊開始連續取若干位形成的新整數。例如 12345 的所有前綴是 1, 12, 123, 1234, 12345。You can generate them by repeatedly dividing by 10 from the right (12345 → 1234 → 123 → 12 → 1).
Hash set / 雜湊集合：一種允許 O(1) 平均時間插入與查詢的容器。在 C 中我們手寫一個開放定址（open-addressing）的雜湊表；在 C++ 中直接用 std::unordered_set。
Open addressing / 開放定址：當雜湊衝突發生時，不用鏈表，而是往後找下一個空位的方法。本題用最簡單的 linear probing（線性探測），每次往後挪一格直到找到目標或空位。
Load factor / 負載因子：雜湊表中已用槽位 ÷ 總槽位。維持在 0.5 左右可保證探測次數很少。本題最多約 4.5 × 10⁵ 個前綴，所以選 2^20 ≈ 10^6 個槽位很安全。
unordered_set<int> / 無序集合：C++ STL 提供的雜湊集合。.insert(x) 插入元素，.count(x) 回傳 0 或 1。
Integer division x / 10 / 整數除以 10：在 C/C++ 中，整數相除會自動截斷小數部分，等於把一個正整數的最右邊一位「砍掉」，正好用來枚舉前綴。

思路

最直觀的暴力作法是：對於每一對 (x, y)，把它們轉成字串並從左邊逐字比較，紀錄最長共同位數。但配對數量是 |arr1| × |arr2| 最多可達 2.5 × 10⁹，明顯會超時。關鍵觀察是：兩個整數 x 和 y 共享某個前綴 p，等價於 p 同時是 x 的某個前綴、也是 y 的某個前綴。所以我們不需要兩兩配對，只要先把 arr1 中所有元素的所有前綴丟進一個雜湊集合 S，再對 arr2 中每個 y 枚舉它的所有前綴，看看哪些落在 S 裡，記錄最大的那個位數即可。一個數最多 9 位，所以每個元素只貢獻 ≤ 9 個前綴，總前綴數 ≤ 9 × 5×10⁴ = 4.5×10⁵，完全可以接受。實作上有個小技巧：對於 arr2 中的 y，從完整的 y 開始反覆 /10，第一個命中雜湊表的前綴就是 y 在 S 中能匹配到的最長前綴——可以立刻 break。

The brute-force idea is to compare every pair (x, y) digit by digit, but with up to 2.5 × 10⁹ pairs that's hopeless. The key insight: any integer p that is a prefix shared by some x ∈ arr1 and some y ∈ arr2 must appear as a prefix of some element on each side. So we don't need pairs at all — we just need a set S containing every prefix of every element of arr1, and then for each y ∈ arr2 we walk through y's prefixes and check membership in S. Because each value is ≤ 10⁸ it has at most 9 digits, so the total number of inserted prefixes is at most 9 × 5×10⁴ ≈ 4.5×10⁵ — small enough for a hash set. A nice optimization on the query side: for each y start from the full y and repeatedly divide by 10; the first prefix you find in S is automatically the longest one for that y, so you can break immediately.

逐步走查 / Walkthrough

用範例 1：arr1 = [1, 10, 100]、arr2 = [1000]。

Step 1 — 建立雜湊集合 S (from arr1)：

處理元素 / element	產生的前綴 / prefixes inserted	當前 S
`1`	`1`	`{1}`
`10`	`10`, `1`	`{1, 10}`
`100`	`100`, `10`, `1`	`{1, 10, 100}`

Step 2 — 對 arr2 中每個 y 查詢最長前綴 / probe each y：

y = 1000，從完整的 y 一直除以 10：

步驟	當前前綴 `y`	在 S 中嗎?	動作
1	`1000`	❌	`y /= 10`
2	`100`	✅	位數 = 3，更新 `best = 3`，break

Step 3 — 回傳 / return：best = 3 ✅

Solution — C

// 演算法：把 arr1 每個元素的「所有前綴」放進手寫雜湊表 S（開放定址 + 線性探測）。
// Algorithm: insert every prefix of every arr1[i] into a hand-rolled open-addressing
// hash table S. Then for each y in arr2, scan y's prefixes from longest to shortest
// (by repeatedly dividing by 10) and record the first hit's digit count.

#include <stdlib.h>   // calloc / free
#include <string.h>   // (not strictly needed but commonly included)

// 計算正整數 x 的位數 / count digits of positive integer x.
static int numDigits(int x) {
    int d = 0;                              // 累計位數 / digit counter
    while (x > 0) {                         // 還有位數沒砍完 / while digits remain
        d++;                                // 位數 +1 / increment count
        x /= 10;                            // 砍掉最右邊一位 / drop rightmost digit
    }
    return d;                               // 回傳位數 / return result
}

int longestCommonPrefix(int* arr1, int arr1Size, int* arr2, int arr2Size) {
    // 雜湊表槽位數選 2^20 ≈ 1,048,576，最多約 4.5×10^5 個鍵，負載因子 < 0.5。
    // Table size 2^20 keeps load factor < 0.5 for up to ~4.5×10^5 keys.
    const unsigned CAP = 1u << 20;          // 容量必須是 2 的次方 / capacity must be power of 2
    const unsigned MASK = CAP - 1;          // 用 & MASK 取餘（比 % 快）/ fast modulo via bitmask

    // calloc 把整塊記憶體歸零；因為題目保證輸入 ≥ 1，我們用 0 代表「空槽」。
    // calloc zeroes the memory; since all input values are ≥ 1, 0 safely means "empty slot".
    int *keys = (int*)calloc(CAP, sizeof(int));

    // -------- Step 1: 把 arr1 所有前綴塞進雜湊表 / insert all prefixes from arr1 --------
    for (int i = 0; i < arr1Size; i++) {    // 走訪 arr1 每個元素 / iterate arr1
        int x = arr1[i];                    // 取目前的數 / current value
        while (x > 0) {                     // 一直砍到變 0 / generate all prefixes
            // 用 Knuth's multiplicative hash：把 int 乘上一個質數常數，再取低位作為 index。
            // Knuth multiplicative hash: multiply by a large prime-ish constant and mask.
            unsigned h = ((unsigned)x * 2654435761u) & MASK;

            // 線性探測：若該槽已被別的 key 佔住，就往後一格找。
            // Linear probing: if occupied by a different key, step forward.
            while (keys[h] != 0 && keys[h] != x) {
                h = (h + 1) & MASK;          // 環狀往後 / wrap-around forward step
            }
            keys[h] = x;                    // 找到空槽或本身已存在，寫入即可 / write (idempotent)
            x /= 10;                        // 砍掉最右一位，產生下一個前綴 / next shorter prefix
        }
    }

    // -------- Step 2: 對 arr2 每個 y 找最長能命中 S 的前綴 / probe arr2 ---------------
    int best = 0;                           // 目前最佳答案 / best length so far
    for (int j = 0; j < arr2Size; j++) {    // 走訪 arr2 每個元素 / iterate arr2
        int y = arr2[j];                    // 從完整的 y 開始 / start from full y
        while (y > 0) {                     // 嘗試所有前綴（由長到短）/ longest → shortest
            unsigned h = ((unsigned)y * 2654435761u) & MASK;  // 同樣的雜湊函式 / same hash
            int found = 0;                  // 旗標：是否找到 / found flag
            while (keys[h] != 0) {          // 探測直到遇到空槽 / probe until empty
                if (keys[h] == y) { found = 1; break; }       // 命中！/ hit
                h = (h + 1) & MASK;          // 否則繼續往後 / otherwise keep probing
            }
            if (found) {                    // 第一次命中就是最長前綴 / first hit = longest
                int d = numDigits(y);       // 位數即答案候選 / digit count is the candidate
                if (d > best) best = d;     // 更新全域最佳 / update best
                break;                      // 不必再砍 y / done with this y
            }
            y /= 10;                        // 沒命中，砍一位再試 / try shorter prefix
        }
    }

    free(keys);                             // 釋放雜湊表記憶體 / release the table
    return best;                            // 回傳答案 / return result
}

Solution — C++

// 演算法：和 C 版完全一樣，差別只在用 std::unordered_set 取代手寫雜湊表。
// Algorithm: identical to the C version, but using std::unordered_set instead of a
// hand-rolled hash table. Insert every prefix from arr1, then for each y in arr2 walk
// its prefixes from longest to shortest and stop at the first one present in the set.

#include <vector>
#include <unordered_set>
using namespace std;

class Solution {
public:
    int longestCommonPrefix(vector<int>& arr1, vector<int>& arr2) {
        // unordered_set<int>：STL 提供的雜湊集合；insert / count 平均 O(1)。
        // unordered_set<int>: STL hash set with average O(1) insert and lookup.
        unordered_set<int> prefixes;

        // reserve 預先配置桶數，避免插入過程中反覆 rehash，加速約 1.5–2x。
        // Pre-reserve buckets so we don't rehash repeatedly while inserting.
        prefixes.reserve(arr1.size() * 9);

        // range-for + 值複製：x 是 arr1 元素的副本，可以安全修改不影響原陣列。
        // Range-for with a value copy: x is a local copy, safe to mutate.
        for (int x : arr1) {
            while (x > 0) {                  // 枚舉 x 的所有前綴 / enumerate prefixes
                prefixes.insert(x);          // 插入雜湊集合（重複插入不影響）/ idempotent insert
                x /= 10;                     // 砍掉最右一位 / drop rightmost digit
            }
        }

        int best = 0;                        // 目前最佳長度 / best length so far
        for (int y : arr2) {                 // 對 arr2 每個 y / for each y in arr2
            while (y > 0) {                  // 由長到短嘗試 y 的前綴 / longest → shortest
                // count(y) 回傳 0 或 1，表示 y 是否在集合中。
                // count(y) returns 0 or 1, indicating membership.
                if (prefixes.count(y)) {
                    int d = 0;               // 計算 y 的位數 / count digits of y
                    for (int t = y; t > 0; t /= 10) d++;  // 經典數位計數迴圈 / classic loop
                    best = max(best, d);     // 更新最佳 / update best
                    break;                   // 第一次命中即最長 / first hit is longest
                }
                y /= 10;                     // 沒命中，再砍一位 / shorten and retry
            }
        }
        return best;                         // 回傳答案 / return answer
    }
};

複雜度 / Complexity

Time: O((n + m) · D)，其中 n = arr1.length、m = arr2.length、D = 9（10⁸ 的最大位數）。插入階段對每個 arr1[i] 最多產生 D 個前綴並做 O(1) 雜湊插入，共 O(n · D)；查詢階段對每個 arr2[j] 最多查 D 個前綴，共 O(m · D)。本題 D ≤ 9 是小常數，整體實際上接近 O(n + m)。 Each element contributes at most D = 9 prefixes; both insert and lookup are O(1) amortized, giving O((n + m) · D). Since D is bounded by 9, this is effectively linear.
Space: O(n · D) — 雜湊表最多存 n · D ≤ 4.5×10⁵ 個鍵。C 版本的常數槽數 2^20 雖然偏大，但仍是 O(n · D) 量級。 The hash table holds at most n · D distinct prefixes. The C version's 2^20 slots are a constant factor on top of that bound.

Pitfalls & Edge Cases

暴力 O(n·m·D) 會 TLE / Brute force times out：n, m 各到 5×10⁴，配對數 2.5×10⁹ 絕對超時——必須用雜湊集合把問題拆成「分別處理 arr1 與 arr2」。
回傳的是長度不是前綴本身 / Return length, not the prefix value：題目要的是位數，不是那個整數。容易讀題時看錯。
找到第一個命中就要 break / Break on the first hit when probing y：因為我們是從長到短枚舉 y 的前綴，第一個出現在集合裡的前綴必然是 y 在 S 中能匹配的最長前綴。若繼續往下找，只會找到更短的，浪費時間（雖不影響正確性）。
整數除以 10 是「砍尾」不是「砍頭」 / x / 10 drops the last digit：這正是我們要的——一個正整數反覆 /10 會依序產生所有前綴（例如 12345 → 1234 → 123 → 12 → 1）。注意：x % 10 是相反方向（取最後一位），切勿混淆。
int 不會溢位 / No overflow risk：所有數值 ≤ 10⁸，遠在 int 範圍內；雜湊計算用 unsigned 是因為要做 wrap-around 乘法，不是因為怕溢位。
C 版的 0 哨兵 / 0 as empty sentinel in C：因為題目保證 arr[i] ≥ 1，所以 0 不可能成為有效鍵，可以安全當作「空槽」標記。如果題目允許 0，就必須額外的 used[] 旗標陣列。
同陣列內配對不算 / Pairs within the same array don't count：題目明確說明，這也是為什麼我們不能直接把 arr1 ∪ arr2 全部丟一個集合然後找最大重複前綴；必須分兩邊處理。