3093. Longest Common Suffix Queries

題目 / Problem

中文給你兩個字串陣列 wordsContainer 和 wordsQuery。對於每一個 wordsQuery[i]，要在 wordsContainer 裡找出一個字串，使它與 wordsQuery[i] 的最長公共後綴最長。後綴就是從字串「結尾」往前看的一段連續字元（例如 "abcd" 的後綴有 "d"、"cd"、"bcd"、"abcd"）。

如果有多個字串都共享同樣長度的最長公共後綴，就選長度最短的那一個；如果長度也一樣，就選在 wordsContainer 中出現最早（索引最小）的那一個。回傳一個整數陣列 ans，ans[i] 是被選中字串在 wordsContainer 裡的索引。注意空後綴 "" 也是合法的「公共後綴」，所以一定找得到答案。

English You are given two string arrays wordsContainer and wordsQuery. For each wordsQuery[i], find a string in wordsContainer that shares the longest common suffix with it. A suffix is a contiguous chunk read from the end of a string (e.g. suffixes of "abcd" are "d", "cd", "bcd", "abcd").

If several container strings tie on the longest common suffix length, pick the one with the smallest length; if still tied, pick the one that appears earliest (smallest index). Return an integer array ans where ans[i] is the chosen string's index. The empty suffix "" always counts as a common suffix, so every query has an answer.

Constraints / 限制 - 1 <= wordsContainer.length, wordsQuery.length <= 10^4 - 1 <= wordsContainer[i].length, wordsQuery[i].length <= 5 * 10^3 - All characters are lowercase English letters. - Sum of wordsContainer[i].length ≤ 5 * 10^5; sum of wordsQuery[i].length ≤ 5 * 10^5.

Worked example / 範例 wordsContainer = ["abcd","bcd","xbcd"], wordsQuery = ["cd","bcd","xyz"] → [1,1,1]. For "cd", all three share suffix "cd"; among them index 1 ("bcd", length 3) is shortest. For "xyz", nobody shares any suffix, so the common suffix is "" shared by all three — again the shortest is index 1.

名詞解釋 / Glossary

後綴 / suffix: 字串末尾的一段連續字元。比較「公共後綴」就是從兩個字串的最後一個字元開始往前逐字比對，看能連續匹配多少個。
反轉字串 / reversing a string: 把字串首尾顛倒。"abcd" → "dcba"。反轉後，比較「後綴」就變成比較「前綴」（開頭的一段），這讓問題更好處理。
前綴 / prefix: 字串開頭的一段連續字元。反轉之後，原本的最長公共後綴就等於最長公共前綴 / after reversing, longest common suffix becomes longest common prefix.
字典樹 / Trie (prefix tree): 一種樹狀資料結構，每條從根節點往下的路徑代表一個字串的前綴。共用相同前綴的字串會共用同一段路徑，所以查詢前綴非常快。Each edge is labelled with one letter; walking down spells out a prefix.
節點 / node: 字典樹中的一個點。這裡每個節點額外記錄「目前經過此節點的容器字串中，最佳的那個索引」/ each node stores the best container index among all words passing through it.
子節點陣列 / children array: 每個節點有 26 個槽位（對應 26 個小寫字母），children[c] = -1 表示該字母方向還沒有子節點。-1 is a sentinel meaning "no child yet".
哨兵值 / sentinel value: 用一個不可能是正常資料的特殊值來表示「空 / 不存在」，這裡用 -1。

思路

最直接的暴力法是：對每個查詢字串，逐一和每個容器字串比較，從尾巴往前數出公共後綴長度，再依規則挑最佳的。但容器和查詢都可能有 10^4 個，字串又可能長達 5×10^3，兩兩相比會是大約 10^8 × 5×10^3 的量級，遠遠超時。關鍵觀察是：很多字串的後綴是重疊的（例如 "abcd"、"bcd"、"xbcd" 都以 "bcd" 結尾），暴力法把這些重複的比較做了無數次。我們需要一個能「共享」這些公共後綴、只比一次的結構。提示告訴我們：把字串反轉，後綴就變前綴，而「處理一堆字串的公共前綴」正是字典樹 (Trie) 最擅長的事。於是做法是：把每個容器字串反轉後插入字典樹，插入時走過的每一個節點都代表「某個共同後綴」，我們在那個節點記錄「目前看到的最佳容器索引」。最佳的定義是：長度最短優先，長度相同則索引較小（較早）優先。因為我們是按索引由小到大插入的，所以只在「長度嚴格更短」時才覆蓋，這樣同長度時自然保留了較早的那個——這就是維持規則的關鍵不變量。查詢時，把查詢字串也反轉，從根節點沿著字母往下走，能走多深就走多深；停下來的那個節點記錄的最佳索引就是答案。如果第一步就走不下去（沒有共同後綴），就停在根節點，而根節點記錄的是全體中最短、最早的那個，正好符合空後綴的規則。

The brute force compares every query against every container word, counting the matching suffix character by character from the tail. With up to 10^4 words on each side and lengths up to 5×10^3, that blows past the time limit. The insight is that many words share suffixes ("abcd", "bcd", "xbcd" all end in "bcd"), and brute force redoes those comparisons over and over. We want a structure that lets shared suffixes be matched once. The hint points the way: reverse the strings so "suffix" becomes "prefix," and grouping many strings by common prefix is exactly what a Trie does. So we insert each reversed container word into a Trie; every node on the insertion path represents one common suffix, and at that node we store the best container index seen so far. "Best" means shortest length first, then smallest index. Since we insert in index order, we only overwrite when a strictly shorter word arrives — so equal-length ties keep the earlier index automatically. That overwrite rule is the invariant that encodes the tie-breaking. To answer a query we reverse it too, walk down from the root as far as the letters let us, and read off the best index stored at the node where we stop. If we can't move even one step (no shared suffix), we stop at the root, whose stored index is the globally shortest-and-earliest word — exactly what the empty-suffix rule requires.

逐步走查 / Walkthrough

Example 1: wordsContainer = ["abcd","bcd","xbcd"], wordsQuery = ["cd","bcd","xyz"]. Reversed container words: "dcba", "dcb", "dcbx". We build the Trie. Each node tracks (bestIdx, bestLen); root starts as (-1, ∞).

Insertion / 插入

Step	Word (i, len)	Action on path	Node updates
1	`"abcd"` (i=0, len=4)	root → d → c → b → a	root: 4<∞ → `(0,4)`. Each new node d,c,b,a → `(0,4)`
2	`"bcd"` (i=1, len=3)	root → d → c → b	root: 3<4 → `(1,3)`. d: 3<4 → `(1,3)`. c → `(1,3)`. b → `(1,3)`
3	`"xbcd"` (i=2, len=4)	root → d → c → b → x	root: 4<3? no, stays `(1,3)`. d: no. c: no. b: no. new node x → `(2,4)`

After insertion the relevant best indices: root (1,3), node d (1,3), node dc (1,3), node dcb (1,3).

Querying / 查詢

Query	Reversed	Walk	Stop node	`bestIdx` → ans
`"cd"`	`"dc"`	root →d →c	node `dc`	`1`
`"bcd"`	`"dcb"`	root →d →c →b	node `dcb`	`1`
`"xyz"`	`"zyx"`	root has no child `z` → stop immediately	root	`1`

Result: [1, 1, 1] ✓ — matches the expected output.

Solution — C

// 演算法：把每個容器字串「反轉後」插入字典樹，後綴比較變成前綴比較。
// Algorithm: insert each container word reversed into a Trie, turning the
// "longest common suffix" comparison into a "longest common prefix" walk.
// 每個節點記錄經過它的最佳容器索引（長度最短、其次索引最小）。
// Each node remembers the best container index passing through it
// (shortest length first, then smallest index). A query reversed then
// walks down as far as possible; the stop node's best index is the answer.

#include <string.h>   // strlen / 取字串長度
#include <stdlib.h>   // malloc, free / 動態記憶體
#include <limits.h>   // INT_MAX / int 的最大值

int* stringIndices(char** wordsContainer, int wordsContainerSize,
                   char** wordsQuery, int wordsQuerySize, int* returnSize) {
    // 先算出所有容器字串的總字元數，用來決定最多需要幾個節點（+1 給根節點）。
    // Count total characters across container words to size the Trie (+1 for root).
    long totalNodes = 1;                                  // 1 = 根節點 / the root
    for (int i = 0; i < wordsContainerSize; i++)
        totalNodes += (long)strlen(wordsContainer[i]);    // 每個字元最多新增一個節點 / each char may add one node

    // children[node][c] = 第 node 個節點、字母 c 方向的子節點編號，-1 表示沒有。
    // children[node][c] = child node id along letter c from `node`; -1 means none.
    // 寫法 int (*children)[26] 代表「指向一個含 26 個 int 的陣列」的指標，
    // i.e. a pointer to arrays-of-26-ints, so children[node] is one node's 26 slots.
    int (*children)[26] = malloc(sizeof(int[26]) * totalNodes);
    int *bestIdx = malloc(sizeof(int) * totalNodes);      // 每節點的最佳索引 / best index per node
    int *bestLen = malloc(sizeof(int) * totalNodes);      // 對應的字串長度 / its word length

    // 初始化根節點（編號 0）。 / Initialise the root (id 0).
    for (int j = 0; j < 26; j++) children[0][j] = -1;     // 26 個方向都還沒有子節點 / no children yet
    bestIdx[0] = -1;                                      // 尚未放入任何字串 / no word stored yet
    bestLen[0] = INT_MAX;                                 // 用最大值，任何真實長度都比它小 / so any real length wins
    int cnt = 1;                                          // 已使用節點數，下一個新節點用此編號 / next free node id

    // ---- 插入每個容器字串（反轉走訪）---- / insert each container word, reversed
    for (int i = 0; i < wordsContainerSize; i++) {
        int len = (int)strlen(wordsContainer[i]);         // 這個字串的長度 / length of this word
        int node = 0;                                     // 從根節點開始 / start at the root

        // 根節點代表「空後綴」，每個字串都會更新它。 / root = empty suffix, every word touches it.
        if (len < bestLen[0]) { bestLen[0] = len; bestIdx[0] = i; }  // 只在更短時覆蓋 → 同長保留較早者 / strict < keeps earliest on ties

        // 從字串「最後一個字元」往前走，等同走訪反轉字串。
        // Walk from the LAST character backwards = traversing the reversed string.
        for (int p = len - 1; p >= 0; p--) {
            int c = wordsContainer[i][p] - 'a';           // 字母轉成 0..25 的索引 / map 'a'..'z' to 0..25
            if (children[node][c] == -1) {                // 這個方向還沒有節點就新建一個 / create child if missing
                children[node][c] = cnt;                  // 記下新節點的編號 / link to the new node id
                for (int j = 0; j < 26; j++) children[cnt][j] = -1;  // 新節點清空子節點 / clear its children
                bestIdx[cnt] = -1;                        // 新節點尚無最佳值 / no best yet
                bestLen[cnt] = INT_MAX;
                cnt++;                                    // 消耗一個節點編號 / consume one node id
            }
            node = children[node][c];                     // 走到子節點 / descend into the child
            // 此節點代表目前已匹配的這段後綴，更新它的最佳索引。
            // This node represents the suffix matched so far; update its best index.
            if (len < bestLen[node]) { bestLen[node] = len; bestIdx[node] = i; }
        }
    }

    // ---- 回答每個查詢 ---- / answer each query
    int *ans = malloc(sizeof(int) * wordsQuerySize);      // 結果陣列 / the output array
    for (int i = 0; i < wordsQuerySize; i++) {
        int len = (int)strlen(wordsQuery[i]);
        int node = 0;                                     // 從根節點開始往下走 / start descending from root
        for (int p = len - 1; p >= 0; p--) {              // 同樣反轉走訪 / reversed walk again
            int c = wordsQuery[i][p] - 'a';
            if (children[node][c] == -1) break;           // 沒有共同後綴可延伸就停 / stop when no shared suffix
            node = children[node][c];                     // 否則繼續往下 / otherwise go deeper
        }
        ans[i] = bestIdx[node];                           // 停下節點的最佳索引就是答案 / best index here is the answer
    }

    *returnSize = wordsQuerySize;                         // 告訴呼叫者結果長度 / report output length
    free(children); free(bestIdx); free(bestLen);         // 釋放 Trie 記憶體，避免洩漏 / free Trie memory
    return ans;                                           // ans 由 LeetCode 接手釋放 / LeetCode frees ans
}

Solution — C++

// 演算法與 C 版相同：反轉後插入字典樹，後綴比較變前綴走訪。
// Same algorithm as the C version: reverse words, insert into a Trie, so the
// longest-common-suffix query becomes a longest-common-prefix walk.
// 每個節點存「最佳容器索引」：長度最短優先，長度相同取較早索引。
// Each node stores the best container index: shortest length, ties broken by earliest index.

#include <vector>
#include <string>
#include <array>
#include <climits>
using namespace std;

class Solution {
public:
    vector<int> stringIndices(vector<string>& wordsContainer, vector<string>& wordsQuery) {
        // Trie 節點：26 個子節點（用 array 固定大小）+ 目前最佳索引與其長度。
        // Trie node: 26 children (fixed-size std::array) + the current best index and its length.
        struct Node {
            array<int, 26> children;   // children[c] = 子節點在 vector 中的位置，-1 表示沒有 / child id, -1 = none
            int bestIdx;               // 最佳容器索引 / best container index
            int bestLen;               // 最佳字串長度 / that word's length
            Node() : bestIdx(-1), bestLen(INT_MAX) { children.fill(-1); } // 建構時全部初始化 / init on construction
        };

        // 用 vector<Node> 當作節點池，索引就是節點編號；trie[0] 是根節點。
        // A vector<Node> is our node pool; the index is the node id; trie[0] is the root.
        vector<Node> trie(1);          // 先放一個根節點 / start with just the root

        // ---- 插入每個容器字串 ---- / insert each container word
        for (int i = 0; i < (int)wordsContainer.size(); i++) {
            const string& w = wordsContainer[i];   // const& 避免複製字串 / reference avoids copying the string
            int len = (int)w.size();
            int node = 0;                           // 從根開始 / start at root

            // 根節點對應空後綴，先更新它。 / root = empty suffix; update it first.
            if (len < trie[node].bestLen) { trie[node].bestLen = len; trie[node].bestIdx = i; } // strict < 保留較早者 / keeps earliest on ties

            // 從末尾往前 = 走訪反轉字串。 / iterate from the end = traverse the reversed word.
            for (int p = len - 1; p >= 0; p--) {
                int c = w[p] - 'a';                 // 字母轉 0..25 / map letter to 0..25
                if (trie[node].children[c] == -1) { // 需要新節點 / need to create a child
                    trie[node].children[c] = (int)trie.size();  // 新節點編號 = 目前大小 / new id = current size
                    trie.emplace_back();            // 在尾端就地建構一個 Node / build a Node in place at the back
                    // 注意：emplace_back 可能讓 vector 重新配置記憶體，
                    // 但我們全程用「索引」而非指標/參照存取，所以安全。
                    // Note: emplace_back may reallocate, but we only use indices (not pointers), so it's safe.
                }
                node = trie[node].children[c];      // 走到子節點 / descend
                if (len < trie[node].bestLen) { trie[node].bestLen = len; trie[node].bestIdx = i; } // 更新此後綴節點 / update this suffix node
            }
        }

        // ---- 回答查詢 ---- / answer the queries
        vector<int> ans;
        ans.reserve(wordsQuery.size());            // 預先配置容量，避免多次重新配置 / reserve to avoid reallocations
        for (const string& q : wordsQuery) {       // range-for：逐一取得每個查詢字串 / range-for over queries
            int node = 0;                          // 從根往下 / descend from root
            for (int p = (int)q.size() - 1; p >= 0; p--) {
                int c = q[p] - 'a';
                if (trie[node].children[c] == -1) break; // 無法再延伸共同後綴 / no further shared suffix
                node = trie[node].children[c];     // 繼續往下 / go deeper
            }
            ans.push_back(trie[node].bestIdx);     // 停下節點的最佳索引 / best index at the stop node
        }
        return ans;
    }
};

複雜度 / Complexity

Time: O(C + Q)，其中 C 是所有容器字串的字元總數、Q 是所有查詢字串的字元總數。每個容器字元只在插入時被處理一次（沿路徑下降並做一次常數時間的比較更新），每個查詢字元也只在查詢時被處理一次。建立與查詢都與字元總數線性相關，不再有暴力法的「兩兩相乘」。 Each container character is touched once during insertion (one descent step + a constant-time best update); each query character is touched once during the walk. No pairwise blow-up, so total work is linear in the combined input size.
Space: O(C × 26)。最壞情況下字典樹的節點數約等於容器字元總數 C，而每個節點固定有 26 個子節點槽位。輸出陣列另需 O(Q) 空間。 In the worst case the Trie has about C nodes (one per container character), and each node holds 26 child slots. The output array adds O(Q).

Pitfalls & Edge Cases

覆蓋規則必須是「嚴格小於」/ overwrite only on strictly smaller length: 用 if (len < bestLen[node]) 而非 <=。因為我們按索引由小到大插入，嚴格小於才能在長度相同時保留較早的索引；用 <= 會錯誤地選到較晚的字串。
別忘了根節點代表空後綴 / the root encodes the empty suffix: 查詢若第一步就找不到子節點（完全沒有共同後綴，如 "xyz"），會停在根節點。根節點存的是全體最短最早的字串，正好是規則要的答案——所以一定有解，不會回傳 -1。
必須反轉走訪，不要真的反轉字串 / walk from the tail, don't actually reverse: 直接用 p = len-1; p>=0; p-- 從尾巴往前讀，等效於處理反轉字串，省去額外建立反轉字串的時間與空間。
字元轉索引別漏 - 'a' / remember - 'a': 子節點陣列大小是 26，索引必須是 0..25。忘了減 'a' 會用到 'a'(97) 這種值而越界，造成記憶體錯誤。
C 版的記憶體管理 / memory in C: 節點上限要用「容器字元總數 + 1」來估，少算會越界寫入。ans 要交給 LeetCode 釋放，但 Trie 三個陣列（children/bestIdx/bestLen）必須自己 free，否則記憶體洩漏。
C++ 用索引而非指標存節點 / index nodes, not pointers (C++): trie.emplace_back() 可能觸發 vector 重新配置，使先前取得的參照或指標失效。全程用整數編號存取 trie[node] 就不受影響。
長度型別 / lengths fit in int: 單一字串最長 5×10^3，總和最長 5×10^5，都在 int 範圍內；不需要擔心溢位，但估算節點總數時在 C 版用 long 累加較保險。