76. Minimum Window Substring
題目 / Problem
中文
給定兩個字串 s 和 t,長度分別為 m 和 n。請回傳 s 中最短的「子字串」,使得這個子字串包含 t 裡的每一個字元(包含重複的字元)。如果不存在這樣的子字串,回傳空字串 ""。
- 「子字串」指的是字串中連續的一段字元。
- 測資保證答案是唯一的。
t中若有重複字元(例如兩個a),那麼視窗裡也必須至少有同樣多的a。
English
Given two strings s and t of lengths m and n, return the shortest substring of s that contains every character of t (including duplicates). If no such substring exists, return the empty string "".
- A "substring" is a contiguous slice of characters.
- The answer is guaranteed to be unique.
- If
thas duplicate characters, the window must contain at least that many copies too.
Constraints / 限制
- m == s.length, n == t.length
- 1 <= m, n <= 10^5
- s 和 t 只由大小寫英文字母組成 / s and t consist of uppercase and lowercase English letters.
- Follow up: 能否做到 O(m + n) 時間?/ Can you do it in O(m + n) time?
Worked example / 範例
Input: s = "ADOBECODEBANC", t = "ABC"
Output: "BANC"
"BANC" 是最短且同時包含 A、B、C 的連續片段。雖然 "ADOBEC" 也包含 A、B、C,但它更長。
"BANC" is the shortest contiguous slice containing A, B, and C. "ADOBEC" also works but is longer.
名詞解釋 / Glossary
- 子字串 / Substring:字串中連續的一段字元(不可跳著選)。例如
"BAN"是"ADOBECODEBANC"的子字串。A contiguous slice of a string — you cannot skip characters. - 雙指針(滑動視窗)/ Two pointers (sliding window):用
left和right兩個下標標出一個區間[left, right],透過移動這兩端來「滑動」或「縮放」這個視窗,而不必每次都重新檢查整段。Two indicesleftandrightmark a range; moving them slides/resizes a "window" over the string so we don't re-scan from scratch. - 頻率表 / Frequency table (count array):一個陣列,用字元當索引、出現次數當值。因為字元只有大小寫英文字母,用大小 128 的整數陣列
cnt[c]就能 O(1) 查到某字元出現幾次。An array indexed by character;cnt[c]stores how many times charactercappears. Size 128 covers all ASCII letters. - 雜湊表 / Hash map:鍵值對的容器,可用任意鍵(這裡是字元)快速查值。C++ 的
unordered_map<char,int>就是一種。A key–value container giving fast lookup; here it maps a character to a count. - 不變量 / Invariant:在演算法執行過程中始終保持為真的條件。本題的關鍵不變量是:「視窗向右擴張到剛好覆蓋
t後,才嘗試從左邊收縮。」A condition that always holds during the loop; here: we only shrink from the left once the window already covers all oft. formed/required:required是t中不同字元的數量;formed是目前視窗中已滿足所需次數的不同字元數量。當formed == required時,整個視窗就覆蓋了t。required= number of distinct chars int;formed= how many of those are currently satisfied in the window. They are equal exactly when the window is valid.
思路
最直覺的暴力法是列舉 s 的所有子字串:固定起點,往右延伸到每一個終點,對每個子字串檢查是否包含 t 的所有字元。但 s 的子字串有 O(m^2) 個,每個又要花時間檢查,總共會是 O(m^2 · n) 或 O(m^2) 等級,當 m 高達 10^5 時必然超時。問題出在我們不斷重複檢查重疊的區間。觀察可知:當一個視窗已經覆蓋 t 時,把右端再往右移只會讓視窗變長、依然覆蓋;而真正能縮短答案的,是在「仍然覆蓋」的前提下把左端往右收。這正是滑動視窗的用武之地。我們用 need[c] 記錄 t 中每個字元還需要幾個,再用 required(t 的不同字元數)和 formed(目前已被滿足的不同字元數)來判斷視窗是否合格。右指針每次吃進一個字元就更新計數;一旦 formed == required,視窗合格,我們就盡量收縮左指針——每收掉一個字元前先看它會不會破壞覆蓋,只要不破壞就繼續收,並隨時記錄遇到的最短合格視窗。因為每個字元最多被右指針進、被左指針出各一次,總共只走兩遍,達成 O(m + n)。關鍵不變量是:只有在視窗已覆蓋 t 時才收縮左端,這保證我們找到的每個「即將失效」的位置都對應一個極小合格視窗。
The brute force lists every substring of s: fix a start, extend to every end, and check each candidate against t. With O(m^2) substrings and a check on each, that's far too slow for m up to 10^5 — and it wastes work re-checking overlapping ranges. The key insight: once a window already covers t, pushing the right edge further only lengthens it; the only way to shorten the answer is to pull the left edge inward while the window stays valid. That's a sliding window. Keep need[c] = how many of character c are still required from t, plus required (the count of distinct characters in t) and formed (how many of those are currently satisfied). Each step the right pointer absorbs one character and updates counts; the moment formed == required the window is valid, so we greedily shrink from the left — before dropping each character we check it won't break coverage, and we record the shortest valid window seen so far. Every character enters via right once and leaves via left at most once, so we make two linear passes overall: O(m + n). The governing invariant is that we shrink the left edge only when the window already covers t, guaranteeing each "about to break" position corresponds to a locally minimal valid window.
逐步走查 / Walkthrough
以 s = "ADOBECODEBANC", t = "ABC" 為例。required = 3(不同字元 A、B、C)。need = {A:1, B:1, C:1}。我們用 have[c] 記錄目前視窗中各字元數量,formed 記錄已滿足的不同字元數。下標從 0 開始。
For s = "ADOBECODEBANC", t = "ABC": required = 3, need = {A:1, B:1, C:1}. have[c] tracks counts inside the window, formed counts satisfied distinct chars.
| right | 進入字元 / char in | 視窗 / window [left,right] |
formed | 動作 / action | 目前最佳 / best so far |
|---|---|---|---|---|---|
| 0 | A | [0,0]="A" |
1 | A 滿足 / A satisfied | — |
| 1 | D | [0,1]="AD" |
1 | D 不需要 / not needed | — |
| 2 | O | [0,2]="ADO" |
1 | — | — |
| 3 | B | [0,3]="ADOB" |
2 | B 滿足 / B satisfied | — |
| 4 | E | [0,4]="ADOBE" |
2 | — | — |
| 5 | C | [0,5]="ADOBEC" |
3 | 合格!收縮 / valid! shrink | len 6 → best="ADOBEC" |
| 收 A→left=1 / drop A | [1,5]="DOBEC" |
2 | A 數量降到 0,formed-- / A drops below need | best 仍 / still "ADOBEC" | |
| 6 | O | [1,6]="DOBECO" |
2 | — | — |
| 7 | D | [1,7] |
2 | — | — |
| 8 | E | [1,8] |
2 | — | — |
| 9 | B | [1,9]="DOBECODEB" |
2 | B 多了一個 / extra B | — |
| 10 | A | [1,10]="DOBECODEBA" |
3 | 合格!收縮 / valid! shrink | len 10 > 6,不更新 |
| 收 D,O,B,E,C,O,D,E → left=9 / drop until A needed | [9,10]="BA" |
… 收到 left=9 後遇到 B 仍夠、C 不夠… |
收縮細節:從 left=1 開始丟掉 D、O、B、E、C…一旦想丟 C(left 指到 5 的 C)會讓 have[C] 變 0、formed 掉到 2,於是停止收縮,此時視窗是 [5,10]="CODEBA"? 實際上停在剛好還合格的最短位置。
Shrinking detail: dropping non-essential chars is free; the moment removing a char would drop a needed count below need, we stop. We keep going as right advances.
| right | 進入字元 / char in | 視窗 / window | formed | 動作 / action | best |
|---|---|---|---|---|---|
| 11 | N | … | … | N 不需要 / not needed | — |
| 12 | C | […,12] |
3 | 合格!收縮 / valid! shrink | 收縮後得到 "BANC" len 4 < 6 → best="BANC" |
最終 best = "BANC",長度 4,即為答案。
Final best = "BANC", length 4 — the answer.
Solution — C
// 演算法 / Algorithm: 滑動視窗 (sliding window)。
// 右指針擴張視窗直到覆蓋 t;覆蓋後收縮左指針求最短,過程中記錄最短合格視窗。
// Expand right until the window covers t; then shrink from the left to minimize,
// recording the shortest valid window. Each char is visited at most twice → O(m+n).
char* minWindow(char* s, char* t) {
// 取得兩字串長度 / get the lengths of both strings.
// strlen 會掃到結尾的 '