290. Word Pattern
題目 / Problem
中文
給定一個字串 pattern(只含小寫字母)和一個字串 s(由空格分隔的單字組成),判斷 s 是否「遵循」pattern。
「遵循」的意思是 pattern 中的字母和 s 中的單字之間存在雙射(一一對應):
- pattern 中每個字母只能對應到 s 中的一個固定單字。
- s 中每個單字也只能對應到 pattern 中的一個固定字母。
- 不能有兩個字母對應到同一個單字,也不能有兩個單字對應到同一個字母。
English
Given a string pattern (lowercase letters) and a string s (words separated by single spaces), decide whether s follows pattern.
"Follow" means there is a bijection (one-to-one, two-way mapping) between letters in pattern and words in s:
- Each letter maps to exactly one word.
- Each word maps to exactly one letter.
- No two letters share a word, and no two words share a letter.
Constraints / 限制
- 1 <= pattern.length <= 300
- pattern 只含小寫英文字母 / only lowercase letters.
- 1 <= s.length <= 3000
- s 只含小寫字母和空格,無前後空格,單字之間用單一空格分隔 / lowercase letters and single spaces, no leading/trailing spaces.
Example / 範例
- pattern = "abba", s = "dog cat cat dog" → true(a↔dog、b↔cat)
- pattern = "abba", s = "dog cat cat fish" → false(a 結尾應對 dog,卻是 fish)
- pattern = "aaaa", s = "dog cat cat dog" → false(a 不能同時對應多個單字)
名詞解釋 / Glossary
- 雙射 / bijection:兩個集合之間的「一對一且雙向」的對應關係。每個元素都有唯一的搭檔,沒有人重複、沒有人落單。本題就是要驗證字母與單字之間是不是雙射。 / A pairing where each element on one side maps to exactly one on the other side, and vice-versa — no sharing, no duplicates.
- 雜湊表 / hash map (hash table):一種「鍵→值」的查表資料結構,平均能在 O(1) 時間內查詢、插入。C++ 用
unordered_map,C 裡我們用簡單的陣列模擬。 / A key→value lookup structure with average O(1) insert/find.unordered_mapin C++; we emulate it with arrays in C. - 正向映射 / forward map:記錄「字母 → 單字」的對應。 / Records letter → word.
- 反向映射 / reverse map:記錄「單字 → 字母」的對應,用來確保沒有兩個字母搶同一個單字。 / Records word → letter, so two letters can't grab the same word.
- 分詞 / tokenizing:把一整串用空格分隔的文字切成一個個單字。C 用
strtok,C++ 用istringstream。 / Splitting a space-separated string into individual words.
思路
最直覺的暴力想法是:對每一對(字母, 單字)都去掃描前面所有出現過的配對,檢查有沒有矛盾。但這樣每一步都要回頭比對,整體會變成 O(n²),而且程式碼很囉嗦。其實我們只需要兩張查表就能在一次掃描內解決。關鍵觀察是:「遵循 pattern」等價於字母與單字之間是雙射,而雙射可以拆成兩個方向同時成立——字母對單字是固定的,單字對字母也是固定的。所以我們用兩張雜湊表:char2word(字母→單字)與 word2char(單字→字母)。我們把 pattern 的第 i 個字母和 s 的第 i 個單字配對,逐一檢查:如果這個字母之前已經有對應的單字,那它現在必須還是同一個單字,否則矛盾;如果這個單字之前已經有對應的字母,那它現在必須還是同一個字母,否則矛盾;都沒問題就把這對關係記下來。最後別忘了字母數量必須等於單字數量,否則一定不是完整匹配。兩張表分別擋住「一個字母配到兩個單字」和「兩個字母搶同一個單字」這兩種違規,合起來剛好保證雙射。
The brute-force idea — for each (letter, word) pair, rescan all earlier pairs to look for a clash — works but costs O(n²) and reads awkwardly. We can do it in a single pass with two lookup tables. The key insight is that "following the pattern" is exactly a bijection between letters and words, and a bijection holds iff both directions are consistent: each letter pins down one word, and each word pins down one letter. So we keep two hash maps, char2word (letter→word) and word2char (word→letter). Walking the i-th letter of pattern together with the i-th word of s, we check: if the letter already has a recorded word, it must still match; if the word already has a recorded letter, it must still match; otherwise we record the new pairing. We also require the count of letters to equal the count of words, or it can't be a full match. The forward table blocks "one letter → two different words" and the reverse table blocks "two letters → the same word"; together they guarantee a true bijection.
逐步走查 / Walkthrough
輸入 / Input: pattern = "abba", s = "dog cat cat dog"
兩張表一開始都是空的 / Both maps start empty.
| i | 字母 letter | 單字 word | char2word 查 / check | word2char 查 / check | 動作 / action | 結果 / state |
|---|---|---|---|---|---|---|
| 0 | a |
dog |
a 沒紀錄 / unseen |
dog 沒紀錄 / unseen |
新增 a→dog, dog→a |
{a:dog}, {dog:a} |
| 1 | b |
cat |
b 沒紀錄 / unseen |
cat 沒紀錄 / unseen |
新增 b→cat, cat→b |
{a:dog,b:cat}, {dog:a,cat:b} |
| 2 | b |
cat |
b→cat ✓ 一致 / matches |
cat→b ✓ 一致 / matches |
無需更動 / consistent | 不變 / unchanged |
| 3 | a |
dog |
a→dog ✓ 一致 / matches |
dog→a ✓ 一致 / matches |
無需更動 / consistent | 不變 / unchanged |
走完全部 4 對都沒有矛盾,且字母數 (4) = 單字數 (4),回傳 true。
All 4 pairs are consistent and counts match (4 = 4), so return true.
對比 Example 2(s = "dog cat cat fish"):在 i=3 時字母 a 之前記錄是 dog,現在單字是 fish,char2word 查到不一致 → 立刻回傳 false。
For Example 2, at i=3 letter a was recorded as dog but now the word is fish — the forward map mismatches, so return false immediately.
Solution — C
// 演算法 / Algorithm:
// 用兩個方向的對應表驗證「字母 ↔ 單字」是雙射。
// char2word[26] 存每個字母對應的單字;用一個雜湊集合記住每個單字已被哪個字母佔用。
// 邊用 strtok 切單字,邊逐字母比對;任何不一致即回傳 false。
// We verify a letter↔word bijection with two-way mapping. char2word holds each
// letter's word; a small hash set remembers which letter owns each word.
#include <stdbool.h> // 提供 bool / true / false / gives us the bool type
#include <string.h> // 提供 strtok, strcmp, strdup / string helpers
#include <stdlib.h> // 提供 malloc, free / memory helpers
// 簡單雜湊表節點:把「單字 → 佔用它的字母」存成鏈結串列
// A hash-table node mapping a word to the letter that owns it.
typedef struct Node {
char *word; // 單字字串 / the word text
char letter; // 佔用此單字的字母 / the letter that claimed this word
struct Node *next; // 同一個桶內的下一個節點 / next node in the same bucket
} Node;
#define BUCKETS 1024 // 雜湊桶數量,取 2 的次方方便用 & 取餘 / number of buckets
// 字串雜湊函式:把單字轉成一個桶的索引
// String hash: turn a word into a bucket index.
static unsigned hashStr(const char *s) {
unsigned h = 5381; // djb2 常見起始值 / classic djb2 seed
for (; *s; s++) // 逐字元 / walk each character
h = h * 33 + (unsigned char)*s; // 經典 djb2 混合 / djb2 mixing step
return h & (BUCKETS - 1); // 取桶索引(等同 % BUCKETS)/ bucket index
}
bool wordPattern(char *pattern, char *s) {
char *char2word[26] = {0}; // 字母→單字,初值全為 NULL 代表「尚未對應」/ letter→word, NULL = unmapped
Node *buckets[BUCKETS] = {0};// 單字→字母 的雜湊表,全部桶先清空 / word→letter hash table, empty
bool ok = true; // 最終答案,先假設成立 / answer, assume true until proven false
int pi = 0; // pattern 的索引(第幾個字母)/ index into pattern
int plen = (int)strlen(pattern); // pattern 長度,最後比對單字數用 / pattern length
// strtok 會就地切割 s,所以複製一份避免破壞輸入
// strtok modifies s in place, so copy it to keep the original intact.
char *copy = strdup(s); // strdup = malloc + 複製字串 / duplicate the string
// strtok:第一次傳字串,之後傳 NULL,會依分隔符 " " 逐段回傳單字
// strtok: first call passes the string, later calls pass NULL to keep going.
char *word = strtok(copy, " ");
while (word != NULL) { // 還有單字就繼續 / loop while words remain
if (pi >= plen) { // 單字比字母多 → 不可能匹配 / more words than letters
ok = false;
break;
}
char c = pattern[pi]; // 目前要配對的字母 / current letter
int li = c - 'a'; // 把 'a'..'z' 轉成 0..25 當陣列索引 / letter as 0..25
// --- 檢查正向:字母 → 單字 ---
// Check forward direction: letter → word.
if (char2word[li] != NULL) { // 這個字母以前對應過 / letter seen before
if (strcmp(char2word[li], word) != 0) { // 但單字不同 → 矛盾 / different word now
ok = false;
break;
}
// 一致就不必動表,直接看下一個 / consistent, fall through
} else {
// --- 字母是新的:先確認這個單字沒被別的字母佔走(反向)---
// New letter: make sure no other letter already owns this word (reverse).
unsigned b = hashStr(word); // 找到單字所在的桶 / locate bucket
Node *cur = buckets[b];
char owner = 0; // 0 代表「沒人佔用」/ 0 means unowned
while (cur) { // 走訪桶內鏈結串列 / scan the chain
if (strcmp(cur->word, word) == 0) { // 找到相同單字 / found this word
owner = cur->letter; // 記下佔用者 / record its owner letter
break;
}
cur = cur->next;
}
if (owner != 0 && owner != c) { // 單字已被「別的」字母佔走 / word taken by another letter
ok = false;
break;
}
if (owner == 0) { // 真的是新單字才插入反向表 / brand-new word → insert
Node *n = (Node *)malloc(sizeof(Node)); // 配置節點 / allocate node
n->word = strdup(word); // 複製單字(copy 之後會被釋放)/ own a copy of the word
n->letter = c; // 記住佔用字母 / store owning letter
n->next = buckets[b]; // 頭插法接到桶上 / push to front of bucket
buckets[b] = n;
}
char2word[li] = word; // 建立正向對應(指向 copy 內字串)/ set forward map
}
pi++; // 換下一個字母 / advance letter index
word = strtok(NULL, " "); // 取下一個單字 / grab next word
}
// 走完後,字母數必須剛好等於單字數,否則字母比單字多 → false
// After the loop, letters and words must be equal in count.
if (ok && pi != plen) ok = false; // pi 是已用掉的字母數 / pi = letters consumed
// 釋放反向表所有節點與字串,避免記憶體洩漏
// Free every node and its duplicated word to avoid leaks.
for (int b = 0; b < BUCKETS; b++) {
Node *cur = buckets[b];
while (cur) {
Node *nxt = cur->next; // 先存下一個再釋放當前 / save next before freeing
free(cur->word); // 釋放複製的單字 / free duplicated word
free(cur); // 釋放節點本身 / free the node
cur = nxt;
}
}
free(copy); // 釋放整份字串副本 / free the string copy
return ok; // 回傳結果 / return the verdict
}
Solution — C++
// 演算法 / Algorithm:
// 用兩張 unordered_map 維護「字母→單字」與「單字→字母」兩個方向的對應,
// 一邊用 istringstream 切詞一邊比對;任一方向衝突即 false,最後字數必須相等。
// Two unordered_maps track letter→word and word→letter; we split words with a
// stringstream and check both directions, requiring equal counts at the end.
#include <string>
#include <sstream> // istringstream:方便用空格切詞 / split words by spaces
#include <unordered_map> // 雜湊表 / hash maps
class Solution {
public:
bool wordPattern(std::string pattern, std::string s) {
// 兩張對應表 / the two mapping tables
std::unordered_map<char, std::string> char2word; // 字母 → 單字 / letter → word
std::unordered_map<std::string, char> word2char; // 單字 → 字母 / word → letter
std::istringstream iss(s); // 把 s 包成可逐詞讀取的串流 / wrap s as a token stream
std::string word; // 暫存當前單字 / holds the current word
size_t i = 0; // 目前對到 pattern 的第幾個字母 / index into pattern
// iss >> word 每次讀出一個以空白分隔的單字,讀完回傳 false 結束迴圈
// `iss >> word` extracts one whitespace-separated word each loop.
while (iss >> word) {
if (i >= pattern.size()) return false; // 單字比字母多 / more words than letters
char c = pattern[i]; // 當前字母 / current letter
// auto + find:在表中查 c;it == end() 代表「沒找到」
// `find` returns an iterator; == end() means "not present".
auto fw = char2word.find(c);
if (fw != char2word.end()) {
// 字母已對應過,必須是同一個單字 / letter mapped before → must match
if (fw->second != word) return false; // fw->second 是對應的單字 / its word
} else {
// 字母是新的:確認單字未被別的字母佔用 / new letter → word must be free
auto rv = word2char.find(word);
if (rv != word2char.end()) {
// 單字已被某字母佔用,且不是 c → 衝突 / word owned by another letter
if (rv->second != c) return false;
} else {
// 兩個方向都是全新,建立雙向對應 / brand-new pair → record both ways
char2word[c] = word; // operator[] 不存在就插入 / insert forward
word2char[word] = c; // 插入反向 / insert reverse
}
}
++i; // 換下一個字母 / advance to next letter
}
// 字母數必須等於單字數,否則字母比單字多 → false
// Letters must equal words; leftover letters mean no full match.
return i == pattern.size();
}
};
複雜度 / Complexity
- Time / 時間:O(n + m) — n 是
pattern的長度,m 是s的長度。我們只把pattern掃一次、把s切詞掃一次,每個字母與單字的查表/插入平均都是 O(1)(字串雜湊與比較的成本攤在單字長度上)。沒有巢狀回頭掃描,所以是線性的。 / We pass overpatternonce and oversonce; each map operation is average O(1). No nested rescans, so it's linear in the input size. - Space / 空間:O(k) — k 是不同字母與不同單字的數量(最多各 26 個字母、若干單字)。兩張表加起來儲存的對應數量受字母與單字種類數限制;C 版另外複製一份
s也是 O(m)。 / The two maps store at most one entry per distinct letter/word; the C version also duplicatess, which is O(m).
Pitfalls & Edge Cases
- 只檢查單一方向會錯 / Checking only one direction fails:若只記「字母→單字」,
pattern="ab",s="dog dog"會誤判為true(a→dog、b→dog,但兩個字母搶同一個單字)。必須同時維護反向表word→char。 / A single-direction map lets two letters share one word; both directions are required. - 長度不一致 / Length mismatch:字母數和單字數不同時一定是
false。範例如pattern="aaaa",s="dog cat"。程式用「迴圈中發現單字超量」與「結尾i == size」兩道檢查涵蓋兩種方向。 / Unequal letter/word counts must return false; we guard both "too many words" mid-loop and "leftover letters" at the end. strtok會破壞原字串 /strtokmutates its input:C 版先strdup(s)再切,避免改壞傳入的s,也避免修改唯讀字面值。 / We copysbefore tokenizing becausestrtokwrites into the buffer.- 記憶體洩漏 / Memory leaks:C 版每個
malloc/strdup都要對應free,結尾統一釋放整張雜湊表與字串副本。 / Every allocation in C is freed at the end to avoid leaks. - 空字串對應不存在 / No empty words:題目保證單字非空、無前後空格、單一空格分隔,所以不需處理空 token;但若放寬輸入,
istringstream與strtok對連續空格的行為會不同,需留意。 / The constraints guarantee single-space separation, so we don't handle empty tokens — but be awarestrtokandistringstreamdiffer on consecutive spaces. - 字母索引 / Letter indexing:C 版用
c - 'a'把字母轉成 0–25 陣列索引;只在保證輸入是小寫字母時才安全。 /c - 'a'is valid only becausepatternis guaranteed lowercase.