290. Word Pattern

題目 / Problem

中文給定一個字串 pattern（只含小寫字母）和一個字串 s（由空格分隔的單字組成），判斷 s 是否「遵循」pattern。

「遵循」的意思是 pattern 中的字母和 s 中的單字之間存在雙射（一一對應）： - pattern 中每個字母只能對應到 s 中的一個固定單字。 - s 中每個單字也只能對應到 pattern 中的一個固定字母。 - 不能有兩個字母對應到同一個單字，也不能有兩個單字對應到同一個字母。

English Given a string pattern (lowercase letters) and a string s (words separated by single spaces), decide whether s follows pattern.

"Follow" means there is a bijection (one-to-one, two-way mapping) between letters in pattern and words in s: - Each letter maps to exactly one word. - Each word maps to exactly one letter. - No two letters share a word, and no two words share a letter.

Constraints / 限制 - 1 <= pattern.length <= 300 - pattern 只含小寫英文字母 / only lowercase letters. - 1 <= s.length <= 3000 - s 只含小寫字母和空格，無前後空格，單字之間用單一空格分隔 / lowercase letters and single spaces, no leading/trailing spaces.

Example / 範例 - pattern = "abba", s = "dog cat cat dog" → true（a↔dog、b↔cat） - pattern = "abba", s = "dog cat cat fish" → false（a 結尾應對 dog，卻是 fish） - pattern = "aaaa", s = "dog cat cat dog" → false（a 不能同時對應多個單字）

名詞解釋 / Glossary

雙射 / bijection：兩個集合之間的「一對一且雙向」的對應關係。每個元素都有唯一的搭檔，沒有人重複、沒有人落單。本題就是要驗證字母與單字之間是不是雙射。 / A pairing where each element on one side maps to exactly one on the other side, and vice-versa — no sharing, no duplicates.
雜湊表 / hash map (hash table)：一種「鍵→值」的查表資料結構，平均能在 O(1) 時間內查詢、插入。C++ 用 unordered_map，C 裡我們用簡單的陣列模擬。 / A key→value lookup structure with average O(1) insert/find. unordered_map in C++; we emulate it with arrays in C.
正向映射 / forward map：記錄「字母 → 單字」的對應。 / Records letter → word.
反向映射 / reverse map：記錄「單字 → 字母」的對應，用來確保沒有兩個字母搶同一個單字。 / Records word → letter, so two letters can't grab the same word.
分詞 / tokenizing：把一整串用空格分隔的文字切成一個個單字。C 用 strtok，C++ 用 istringstream。 / Splitting a space-separated string into individual words.

思路

最直覺的暴力想法是：對每一對（字母, 單字）都去掃描前面所有出現過的配對，檢查有沒有矛盾。但這樣每一步都要回頭比對，整體會變成 O(n²)，而且程式碼很囉嗦。其實我們只需要兩張查表就能在一次掃描內解決。關鍵觀察是：「遵循 pattern」等價於字母與單字之間是雙射，而雙射可以拆成兩個方向同時成立——字母對單字是固定的，單字對字母也是固定的。所以我們用兩張雜湊表：char2word（字母→單字）與 word2char（單字→字母）。我們把 pattern 的第 i 個字母和 s 的第 i 個單字配對，逐一檢查：如果這個字母之前已經有對應的單字，那它現在必須還是同一個單字，否則矛盾；如果這個單字之前已經有對應的字母，那它現在必須還是同一個字母，否則矛盾；都沒問題就把這對關係記下來。最後別忘了字母數量必須等於單字數量，否則一定不是完整匹配。兩張表分別擋住「一個字母配到兩個單字」和「兩個字母搶同一個單字」這兩種違規，合起來剛好保證雙射。

The brute-force idea — for each (letter, word) pair, rescan all earlier pairs to look for a clash — works but costs O(n²) and reads awkwardly. We can do it in a single pass with two lookup tables. The key insight is that "following the pattern" is exactly a bijection between letters and words, and a bijection holds iff both directions are consistent: each letter pins down one word, and each word pins down one letter. So we keep two hash maps, char2word (letter→word) and word2char (word→letter). Walking the i-th letter of pattern together with the i-th word of s, we check: if the letter already has a recorded word, it must still match; if the word already has a recorded letter, it must still match; otherwise we record the new pairing. We also require the count of letters to equal the count of words, or it can't be a full match. The forward table blocks "one letter → two different words" and the reverse table blocks "two letters → the same word"; together they guarantee a true bijection.

逐步走查 / Walkthrough

輸入 / Input: pattern = "abba", s = "dog cat cat dog"

兩張表一開始都是空的 / Both maps start empty.

i	字母 letter	單字 word	char2word 查 / check	word2char 查 / check	動作 / action	結果 / state
0	`a`	`dog`	`a` 沒紀錄 / unseen	`dog` 沒紀錄 / unseen	新增 `a→dog`, `dog→a`	`{a:dog}`, `{dog:a}`
1	`b`	`cat`	`b` 沒紀錄 / unseen	`cat` 沒紀錄 / unseen	新增 `b→cat`, `cat→b`	`{a:dog,b:cat}`, `{dog:a,cat:b}`
2	`b`	`cat`	`b→cat` ✓ 一致 / matches	`cat→b` ✓ 一致 / matches	無需更動 / consistent	不變 / unchanged
3	`a`	`dog`	`a→dog` ✓ 一致 / matches	`dog→a` ✓ 一致 / matches	無需更動 / consistent	不變 / unchanged

走完全部 4 對都沒有矛盾，且字母數 (4) = 單字數 (4)，回傳 true。 All 4 pairs are consistent and counts match (4 = 4), so return true.

對比 Example 2（s = "dog cat cat fish"）：在 i=3 時字母 a 之前記錄是 dog，現在單字是 fish，char2word 查到不一致 → 立刻回傳 false。 For Example 2, at i=3 letter a was recorded as dog but now the word is fish — the forward map mismatches, so return false immediately.

Solution — C

// 演算法 / Algorithm:
// 用兩個方向的對應表驗證「字母 ↔ 單字」是雙射。
// char2word[26] 存每個字母對應的單字；用一個雜湊集合記住每個單字已被哪個字母佔用。
// 邊用 strtok 切單字，邊逐字母比對；任何不一致即回傳 false。
// We verify a letter↔word bijection with two-way mapping. char2word holds each
// letter's word; a small hash set remembers which letter owns each word.

#include <stdbool.h>   // 提供 bool / true / false / gives us the bool type
#include <string.h>    // 提供 strtok, strcmp, strdup / string helpers
#include <stdlib.h>    // 提供 malloc, free / memory helpers

// 簡單雜湊表節點：把「單字 → 佔用它的字母」存成鏈結串列
// A hash-table node mapping a word to the letter that owns it.
typedef struct Node {
    char *word;          // 單字字串 / the word text
    char letter;         // 佔用此單字的字母 / the letter that claimed this word
    struct Node *next;   // 同一個桶內的下一個節點 / next node in the same bucket
} Node;

#define BUCKETS 1024     // 雜湊桶數量，取 2 的次方方便用 & 取餘 / number of buckets

// 字串雜湊函式：把單字轉成一個桶的索引
// String hash: turn a word into a bucket index.
static unsigned hashStr(const char *s) {
    unsigned h = 5381;                       // djb2 常見起始值 / classic djb2 seed
    for (; *s; s++)                          // 逐字元 / walk each character
        h = h * 33 + (unsigned char)*s;      // 經典 djb2 混合 / djb2 mixing step
    return h & (BUCKETS - 1);                // 取桶索引（等同 % BUCKETS）/ bucket index
}

bool wordPattern(char *pattern, char *s) {
    char *char2word[26] = {0};   // 字母→單字，初值全為 NULL 代表「尚未對應」/ letter→word, NULL = unmapped
    Node *buckets[BUCKETS] = {0};// 單字→字母 的雜湊表，全部桶先清空 / word→letter hash table, empty
    bool ok = true;              // 最終答案，先假設成立 / answer, assume true until proven false

    int pi = 0;                  // pattern 的索引（第幾個字母）/ index into pattern
    int plen = (int)strlen(pattern); // pattern 長度，最後比對單字數用 / pattern length

    // strtok 會就地切割 s，所以複製一份避免破壞輸入
    // strtok modifies s in place, so copy it to keep the original intact.
    char *copy = strdup(s);      // strdup = malloc + 複製字串 / duplicate the string

    // strtok：第一次傳字串，之後傳 NULL，會依分隔符 " " 逐段回傳單字
    // strtok: first call passes the string, later calls pass NULL to keep going.
    char *word = strtok(copy, " ");

    while (word != NULL) {       // 還有單字就繼續 / loop while words remain
        if (pi >= plen) {        // 單字比字母多 → 不可能匹配 / more words than letters
            ok = false;
            break;
        }
        char c = pattern[pi];            // 目前要配對的字母 / current letter
        int li = c - 'a';                // 把 'a'..'z' 轉成 0..25 當陣列索引 / letter as 0..25

        // --- 檢查正向：字母 → 單字 ---
        // Check forward direction: letter → word.
        if (char2word[li] != NULL) {                 // 這個字母以前對應過 / letter seen before
            if (strcmp(char2word[li], word) != 0) {  // 但單字不同 → 矛盾 / different word now
                ok = false;
                break;
            }
            // 一致就不必動表，直接看下一個 / consistent, fall through
        } else {
            // --- 字母是新的：先確認這個單字沒被別的字母佔走（反向）---
            // New letter: make sure no other letter already owns this word (reverse).
            unsigned b = hashStr(word);              // 找到單字所在的桶 / locate bucket
            Node *cur = buckets[b];
            char owner = 0;                          // 0 代表「沒人佔用」/ 0 means unowned
            while (cur) {                            // 走訪桶內鏈結串列 / scan the chain
                if (strcmp(cur->word, word) == 0) {  // 找到相同單字 / found this word
                    owner = cur->letter;             // 記下佔用者 / record its owner letter
                    break;
                }
                cur = cur->next;
            }
            if (owner != 0 && owner != c) {          // 單字已被「別的」字母佔走 / word taken by another letter
                ok = false;
                break;
            }
            if (owner == 0) {                        // 真的是新單字才插入反向表 / brand-new word → insert
                Node *n = (Node *)malloc(sizeof(Node)); // 配置節點 / allocate node
                n->word = strdup(word);              // 複製單字（copy 之後會被釋放）/ own a copy of the word
                n->letter = c;                       // 記住佔用字母 / store owning letter
                n->next = buckets[b];                // 頭插法接到桶上 / push to front of bucket
                buckets[b] = n;
            }
            char2word[li] = word;                    // 建立正向對應（指向 copy 內字串）/ set forward map
        }

        pi++;                            // 換下一個字母 / advance letter index
        word = strtok(NULL, " ");        // 取下一個單字 / grab next word
    }

    // 走完後，字母數必須剛好等於單字數，否則字母比單字多 → false
    // After the loop, letters and words must be equal in count.
    if (ok && pi != plen) ok = false;    // pi 是已用掉的字母數 / pi = letters consumed

    // 釋放反向表所有節點與字串，避免記憶體洩漏
    // Free every node and its duplicated word to avoid leaks.
    for (int b = 0; b < BUCKETS; b++) {
        Node *cur = buckets[b];
        while (cur) {
            Node *nxt = cur->next;       // 先存下一個再釋放當前 / save next before freeing
            free(cur->word);             // 釋放複製的單字 / free duplicated word
            free(cur);                   // 釋放節點本身 / free the node
            cur = nxt;
        }
    }
    free(copy);                          // 釋放整份字串副本 / free the string copy

    return ok;                           // 回傳結果 / return the verdict
}

Solution — C++

// 演算法 / Algorithm:
// 用兩張 unordered_map 維護「字母→單字」與「單字→字母」兩個方向的對應，
// 一邊用 istringstream 切詞一邊比對；任一方向衝突即 false，最後字數必須相等。
// Two unordered_maps track letter→word and word→letter; we split words with a
// stringstream and check both directions, requiring equal counts at the end.

#include <string>
#include <sstream>          // istringstream：方便用空格切詞 / split words by spaces
#include <unordered_map>    // 雜湊表 / hash maps

class Solution {
public:
    bool wordPattern(std::string pattern, std::string s) {
        // 兩張對應表 / the two mapping tables
        std::unordered_map<char, std::string> char2word; // 字母 → 單字 / letter → word
        std::unordered_map<std::string, char> word2char; // 單字 → 字母 / word → letter

        std::istringstream iss(s); // 把 s 包成可逐詞讀取的串流 / wrap s as a token stream
        std::string word;          // 暫存當前單字 / holds the current word
        size_t i = 0;              // 目前對到 pattern 的第幾個字母 / index into pattern

        // iss >> word 每次讀出一個以空白分隔的單字，讀完回傳 false 結束迴圈
        // `iss >> word` extracts one whitespace-separated word each loop.
        while (iss >> word) {
            if (i >= pattern.size()) return false; // 單字比字母多 / more words than letters
            char c = pattern[i];                   // 當前字母 / current letter

            // auto + find：在表中查 c；it == end() 代表「沒找到」
            // `find` returns an iterator; == end() means "not present".
            auto fw = char2word.find(c);
            if (fw != char2word.end()) {
                // 字母已對應過，必須是同一個單字 / letter mapped before → must match
                if (fw->second != word) return false; // fw->second 是對應的單字 / its word
            } else {
                // 字母是新的：確認單字未被別的字母佔用 / new letter → word must be free
                auto rv = word2char.find(word);
                if (rv != word2char.end()) {
                    // 單字已被某字母佔用，且不是 c → 衝突 / word owned by another letter
                    if (rv->second != c) return false;
                } else {
                    // 兩個方向都是全新，建立雙向對應 / brand-new pair → record both ways
                    char2word[c] = word;   // operator[] 不存在就插入 / insert forward
                    word2char[word] = c;   // 插入反向 / insert reverse
                }
            }
            ++i; // 換下一個字母 / advance to next letter
        }

        // 字母數必須等於單字數，否則字母比單字多 → false
        // Letters must equal words; leftover letters mean no full match.
        return i == pattern.size();
    }
};

複雜度 / Complexity

Time / 時間：O(n + m) — n 是 pattern 的長度，m 是 s 的長度。我們只把 pattern 掃一次、把 s 切詞掃一次，每個字母與單字的查表/插入平均都是 O(1)（字串雜湊與比較的成本攤在單字長度上）。沒有巢狀回頭掃描，所以是線性的。 / We pass over pattern once and over s once; each map operation is average O(1). No nested rescans, so it's linear in the input size.
Space / 空間：O(k) — k 是不同字母與不同單字的數量（最多各 26 個字母、若干單字）。兩張表加起來儲存的對應數量受字母與單字種類數限制；C 版另外複製一份 s 也是 O(m)。 / The two maps store at most one entry per distinct letter/word; the C version also duplicates s, which is O(m).

Pitfalls & Edge Cases

只檢查單一方向會錯 / Checking only one direction fails：若只記「字母→單字」，pattern="ab", s="dog dog" 會誤判為 true（a→dog、b→dog，但兩個字母搶同一個單字）。必須同時維護反向表 word→char。 / A single-direction map lets two letters share one word; both directions are required.
長度不一致 / Length mismatch：字母數和單字數不同時一定是 false。範例如 pattern="aaaa", s="dog cat"。程式用「迴圈中發現單字超量」與「結尾 i == size」兩道檢查涵蓋兩種方向。 / Unequal letter/word counts must return false; we guard both "too many words" mid-loop and "leftover letters" at the end.
strtok 會破壞原字串 / strtok mutates its input：C 版先 strdup(s) 再切，避免改壞傳入的 s，也避免修改唯讀字面值。 / We copy s before tokenizing because strtok writes into the buffer.
記憶體洩漏 / Memory leaks：C 版每個 malloc/strdup 都要對應 free，結尾統一釋放整張雜湊表與字串副本。 / Every allocation in C is freed at the end to avoid leaks.
空字串對應不存在 / No empty words：題目保證單字非空、無前後空格、單一空格分隔，所以不需處理空 token；但若放寬輸入，istringstream 與 strtok 對連續空格的行為會不同，需留意。 / The constraints guarantee single-space separation, so we don't handle empty tokens — but be aware strtok and istringstream differ on consecutive spaces.
字母索引 / Letter indexing：C 版用 c - 'a' 把字母轉成 0–25 陣列索引；只在保證輸入是小寫字母時才安全。 / c - 'a' is valid only because pattern is guaranteed lowercase.