49. Group Anagrams

題目 / Problem

中文：給定一個字串陣列 strs，把彼此互為「易位構詞」（anagram）的字串分到同一組。所謂易位構詞，是指兩個字串用到完全相同的字母、且每個字母出現次數也相同，只是排列順序不同（例如 "eat" 和 "tea"）。你可以用任意順序回傳答案，組與組之間、組內元素之間的順序都不要求。

English: Given an array of strings strs, group the strings that are anagrams of each other into the same group. Two strings are anagrams if they contain exactly the same letters with the same counts, just in a different order (e.g. "eat" and "tea"). You may return the answer in any order — neither the order of the groups nor the order within a group matters.

Constraints / 限制 - 1 <= strs.length <= 10^4 - 0 <= strs[i].length <= 100 - strs[i] 只由小寫英文字母組成 / consists of lowercase English letters only.

Worked example / 範例

Input:  strs = ["eat","tea","tan","ate","nat","bat"]
Output: [["bat"],["nat","tan"],["ate","eat","tea"]]

"eat", "tea", "ate" 互為易位構詞 → 同一組；"tan", "nat" → 同一組；"bat" 自己一組。

名詞解釋 / Glossary

易位構詞 / anagram：兩字串字母種類與每種字母的數量完全相同，只是順序不同。判斷的關鍵是「字母多重集合」是否相等。/ Two strings using the exact same multiset of letters; only the order differs.
雜湊表 / hash map：一種以「鍵 → 值」存取資料的結構，平均能在 O(1) 時間查找與插入。本題用它把「某個簽名」映射到「屬於該簽名的字串清單」。/ A key→value structure with average O(1) lookup/insert; here it maps a signature to the list of strings sharing it.
簽名 / canonical key (signature)：替每個字串算出一個「同組必相同、不同組必不同」的代表字串。兩種常見做法：把字母排序後的結果，或 26 個字母的出現次數編碼成字串。/ A representative string that is identical for anagrams and different otherwise — either the sorted letters, or the 26 letter-counts encoded as text.
計數陣列 / count array：長度 26 的整數陣列，索引 0..25 對應字母 a..z，記錄每個字母出現幾次。/ A length-26 int array indexed by letter (a→0 … z→25) holding each letter's frequency.
unordered_map（C++）：C++ 標準函式庫提供的雜湊表，鍵唯一、查找平均 O(1)。/ The C++ standard-library hash map: unique keys, average O(1) access.

思路

最直接的暴力想法：對每一對字串都判斷它們是不是易位構詞，是的話就放進同一組。但「判斷兩兩是否同組」要做 O(n²) 次比較，n 最大到 10⁴ 時就是上億次操作，太慢。問題的核心在於：我們其實不需要「兩兩比較」，只要能替每個字串算出一個代表它所屬組別的標記，標記相同就是同一組。這個標記就叫「簽名」。怎麼設計簽名？關鍵觀察是：易位構詞們把字母排序後會得到完全相同的字串，例如 "eat"、"tea"、"ate" 排序後都是 "aet"。於是把排序後的字串當作簽名，丟進雜湊表：簽名當鍵，值是一個清單，存放所有擁有這個簽名的原始字串。掃過一遍陣列，每個字串算簽名、塞進對應的清單，最後把雜湊表裡所有清單收集起來就是答案。雜湊表讓「找出某簽名對應的那組」變成平均 O(1)，整體只需掃一遍。

另一種更快的簽名作法（C 版用這個）：因為字母只有小寫 a–z，可以用一個長度 26 的計數陣列數每個字母出現幾次，再把這 26 個數字編碼成一個字串當簽名。這樣算簽名只要 O(L)（L 是字串長度），比排序的 O(L log L) 更省，而且同樣保證「同組簽名相同、不同組簽名不同」。

The brute-force idea is to compare every pair of strings and check whether they're anagrams, merging matches into groups. But pairwise comparison is O(n²), which blows up when n is 10⁴. The real insight is that we don't need to compare pairs at all — we just need a signature for each string that is identical for anagrams and different otherwise. Since anagrams become the same string once their letters are sorted ("eat", "tea", "ate" all sort to "aet"), we can use the sorted string as the signature. We feed everything into a hash map: the signature is the key, and the value is the list of original strings that produced it. One pass over the array — compute each signature, append the string to its bucket — and finally we collect all the buckets as the answer. The hash map turns "find the right group for this signature" into an average O(1) operation.

An even faster signature (used in the C version): since the alphabet is just lowercase a–z, count each letter into a length-26 array, then encode those 26 counts into a string. Building this signature is O(L) instead of sorting's O(L log L), while still guaranteeing same-group ⇔ same-signature.

逐步走查 / Walkthrough

輸入 / Input: ["eat","tea","tan","ate","nat","bat"]，使用「排序後字串」當簽名，維護一個雜湊表 map: signature → list。

步驟 Step	目前字串 word	排序後簽名 sorted key	雜湊表動作 map action	雜湊表狀態 map state
1	`"eat"`	`"aet"`	新鍵，建立清單 / new key	`{aet: [eat]}`
2	`"tea"`	`"aet"`	鍵已存在，附加 / append	`{aet: [eat, tea]}`
3	`"tan"`	`"ant"`	新鍵，建立清單 / new key	`{aet:[eat,tea], ant:[tan]}`
4	`"ate"`	`"aet"`	鍵已存在，附加 / append	`{aet:[eat,tea,ate], ant:[tan]}`
5	`"nat"`	`"ant"`	鍵已存在，附加 / append	`{aet:[eat,tea,ate], ant:[tan,nat]}`
6	`"bat"`	`"abt"`	新鍵，建立清單 / new key	`{aet:[eat,tea,ate], ant:[tan,nat], abt:[bat]}`

最後 / Finally：收集所有清單值 → [["eat","tea","ate"], ["tan","nat"], ["bat"]]。順序可任意，符合題目要求。/ Collect all map values; any order is accepted.

Solution — C

// 演算法 / Algorithm:
// 對每個字串算一個長度 26 的字母計數，編碼成「簽名」字串；
// 簽名相同 = 同組易位構詞。用簽名陣列分組（這裡用線性比對找已存在的簽名）。
// For each string, build a 26-letter count, encode it into a signature string;
// equal signatures = same anagram group. Group by matching signatures.

#include <stdlib.h>   // malloc / free / qsort
#include <string.h>   // strcmp / strlen / strcpy

// 比較函式給 qsort 用：把字串裡的字元由小到大排序
// Comparator for qsort: sorts characters ascending.
static int cmp_char(const void *a, const void *b) {
    // *(char*)a 把 void* 轉回 char* 再取值 / cast void* back to char* and dereference
    return (*(const char *)a) - (*(const char *)b);
}

// 回傳值：char***（指向「字串陣列的陣列」），即每組是一個 char**
// Returns char*** : an array of groups, each group being a char** (array of strings).
char ***groupAnagrams(char **strs, int strsSize,
                      int *returnSize, int **returnColumnSizes) {
    // keys[i] 存第 i 組的簽名；用來判斷新字串該歸到哪一組
    // keys[i] holds group i's signature, used to decide which group a string joins.
    char **keys   = (char **)malloc(sizeof(char *) * strsSize);   // 最多 strsSize 組 / at most strsSize groups
    char ***groups = (char ***)malloc(sizeof(char **) * strsSize); // 每組的字串清單 / list per group
    int *counts   = (int *)malloc(sizeof(int) * strsSize);         // 每組目前有幾個字串 / size of each group
    int groupCnt  = 0;                                             // 目前共幾組 / number of groups so far

    for (int i = 0; i < strsSize; i++) {
        // ---- 步驟 A：替 strs[i] 算簽名（排序它的字元複本）----
        // Step A: compute signature of strs[i] by sorting a copy of its chars.
        int len = (int)strlen(strs[i]);                 // 字串長度（不含結尾 ''）/ length excluding NUL
        char *sig = (char *)malloc(len + 1);            // +1 給字串結尾 '' / +1 for the terminator
        strcpy(sig, strs[i]);                           // 複製一份，避免改到原字串 / copy so we don't mutate input
        qsort(sig, len, sizeof(char), cmp_char);        // 原地排序字元 / sort chars in place -> signature

        // ---- 步驟 B：在已有的組裡找相同簽名 ----
        // Step B: search existing groups for a matching signature.
        int g = -1;                                     // -1 表示還沒找到 / -1 means not found yet
        for (int j = 0; j < groupCnt; j++) {
            if (strcmp(keys[j], sig) == 0) {            // strcmp==0 代表兩簽名相等 / equal signatures
                g = j;                                  // 找到所屬組別 / found the group index
                break;
            }
        }

        if (g == -1) {                                  // 沒有相同簽名 → 開新組 / no match -> new group
            g = groupCnt++;                             // 取一個新組編號 / allocate a new group id
            keys[g]   = sig;                            // 這組的簽名就是 sig（沿用，不再 free）/ keep sig as the key
            groups[g] = (char **)malloc(sizeof(char *) * strsSize); // 預留空間 / reserve room
            counts[g] = 0;                              // 新組目前 0 個元素 / starts empty
        } else {
            free(sig);                                  // 已有此組，sig 用不到了，釋放避免漏記憶體 / free unused copy
        }

        // ---- 步驟 C：把原始字串放進該組 ----
        // Step C: append the ORIGINAL string into its group.
        groups[g][counts[g]] = strs[i];                 // 存原字串指標 / store pointer to original string
        counts[g]++;                                    // 該組元素數 +1 / group size +1
    }

    // ---- 步驟 D：依 LeetCode 介面填寫回傳資訊 ----
    // Step D: fill the output parameters required by LeetCode.
    *returnSize = groupCnt;                             // 共幾組 / number of groups
    *returnColumnSizes = (int *)malloc(sizeof(int) * groupCnt); // 每組大小的陣列 / sizes array
    for (int j = 0; j < groupCnt; j++) {
        (*returnColumnSizes)[j] = counts[j];            // 第 j 組有幾個字串 / size of group j
        free(keys[j]);                                  // 簽名已用完，釋放 / signature no longer needed
    }

    free(keys);     // 釋放輔助陣列 / free helper arrays
    free(counts);
    return groups;  // groups 與其子陣列交給 LeetCode 管理 / hand groups to the harness
}

Solution — C++

// 演算法 / Algorithm:
// 對每個字串把字元排序成「簽名」，用 unordered_map 以簽名為鍵收集字串。
// 最後把每個桶（bucket）搬進結果向量。
// Sort each string's chars into a signature; collect strings in an unordered_map
// keyed by signature; finally move each bucket into the result vector.

#include <vector>
#include <string>
#include <unordered_map>
#include <algorithm>   // std::sort

class Solution {
public:
    std::vector<std::vector<std::string>> groupAnagrams(std::vector<std::string>& strs) {
        // unordered_map：雜湊表，鍵=簽名，值=該簽名的字串清單
        // hash map: key = signature, value = list of strings with that signature.
        std::unordered_map<std::string, std::vector<std::string>> groups;

        // range-for：依序取出 strs 裡每個字串；const& 避免複製、且不修改原值
        // range-for over strs; const& avoids copying and forbids mutation.
        for (const std::string& s : strs) {
            std::string key = s;            // 複製一份當簽名來源 / copy to build the signature
            std::sort(key.begin(), key.end()); // 排序字元 → 簽名 / sort chars -> signature
            // groups[key] 若鍵不存在會自動建立空 vector，再 push_back 加入字串
            // operator[] auto-creates an empty vector if key is new, then we append.
            groups[key].push_back(s);
        }

        std::vector<std::vector<std::string>> result;   // 最終答案 / final answer
        result.reserve(groups.size());                  // 預留容量，減少重新配置 / reserve to avoid reallocs

        // 結構化綁定 [key, bucket]：把 map 的每個 pair 拆成鍵與值
        // structured binding splits each map entry into key and value.
        for (auto& [key, bucket] : groups) {
            // std::move：把 bucket 內容「搬」進結果，避免再複製一份字串
            // std::move transfers ownership instead of copying the vector.
            result.push_back(std::move(bucket));
        }
        return result;
    }
};

複雜度 / Complexity

Time: O(n · L log L) — 設 n 是字串個數、L 是字串平均長度。每個字串要排序字元花 O(L log L)，共 n 個，雜湊表查找/插入平均 O(L)（因為要對長度 L 的鍵做雜湊與比較）。排序項主導，故總和為 O(n · L log L)。C 版用計數法算簽名則可降到 O(n · L)。/ n strings, average length L. Sorting each string's chars dominates at O(L log L); hashing the length-L key is O(L). Total O(n · L log L). The C count-based signature lowers this to O(n · L).
Space: O(n · L) — 雜湊表（或分組陣列）要存下所有字串與其簽名，總字元量級為 n · L。/ The map/groups store every string plus signatures, on the order of n · L characters.

Pitfalls & Edge Cases

空字串 "" / empty string：strs = [""] 是合法輸入。空字串排序後仍是空字串，簽名為 ""，會正確獨立成一組 [[""]]。別在程式裡對長度 0 特判而漏掉它。/ "" sorts to "", forms its own group correctly — don't special-case length 0 and drop it.
C 裡別改到原字串 / don't mutate the input in C：排序必須在複本上做（strcpy 後再 qsort），否則排序會破壞原字串，最後存進組裡的就不是原貌了。/ Sort a copy; sorting the original would corrupt the strings you must return.
記憶體釋放 / freeing memory in C：當簽名命中已存在的組時，那份 sig 用不到了要 free，否則每個重複字串都漏一塊記憶體。新組則保留 sig 當鍵、最後統一釋放。/ Free sig on a cache hit; keep it as the key for new groups and free at the end.
回傳介面 / LeetCode return contract (C)：returnSize 是組數，returnColumnSizes 是每組大小的陣列，兩者都要正確填寫且後者要 malloc，漏填會判錯或當掉。/ Fill *returnSize and allocate *returnColumnSizes; forgetting either fails the judge.
用排序後字串而非原字串當鍵 / key on the signature, not the raw string：若不小心用原字串當鍵，每個字串都自成一組，完全失去分組效果。/ Keying on the raw string puts every word in its own group — bug.
大小寫假設 / lowercase assumption：題目保證全小寫，計數陣列用長度 26 即可；若輸入可能含大寫或其他字元，計數法需擴大範圍，排序法則天然不受影響。/ Counts of size 26 rely on all-lowercase input; sorting is robust regardless.