76. Minimum Window Substring

題目 / Problem

中文給定兩個字串 s 和 t，長度分別為 m 和 n。請回傳 s 中最短的「子字串」，使得這個子字串包含 t 裡的每一個字元（包含重複的字元）。如果不存在這樣的子字串，回傳空字串 ""。

「子字串」指的是字串中連續的一段字元。
測資保證答案是唯一的。
t 中若有重複字元（例如兩個 a），那麼視窗裡也必須至少有同樣多的 a。

English Given two strings s and t of lengths m and n, return the shortest substring of s that contains every character of t (including duplicates). If no such substring exists, return the empty string "".

A "substring" is a contiguous slice of characters.
The answer is guaranteed to be unique.
If t has duplicate characters, the window must contain at least that many copies too.

Constraints / 限制 - m == s.length, n == t.length - 1 <= m, n <= 10^5 - s 和 t 只由大小寫英文字母組成 / s and t consist of uppercase and lowercase English letters. - Follow up: 能否做到 O(m + n) 時間？/ Can you do it in O(m + n) time?

Worked example / 範例

Input:  s = "ADOBECODEBANC", t = "ABC"
Output: "BANC"

"BANC" 是最短且同時包含 A、B、C 的連續片段。雖然 "ADOBEC" 也包含 A、B、C，但它更長。 "BANC" is the shortest contiguous slice containing A, B, and C. "ADOBEC" also works but is longer.

名詞解釋 / Glossary

子字串 / Substring：字串中連續的一段字元（不可跳著選）。例如 "BAN" 是 "ADOBECODEBANC" 的子字串。A contiguous slice of a string — you cannot skip characters.
雙指針（滑動視窗）/ Two pointers (sliding window)：用 left 和 right 兩個下標標出一個區間 [left, right]，透過移動這兩端來「滑動」或「縮放」這個視窗，而不必每次都重新檢查整段。Two indices left and right mark a range; moving them slides/resizes a "window" over the string so we don't re-scan from scratch.
頻率表 / Frequency table (count array)：一個陣列，用字元當索引、出現次數當值。因為字元只有大小寫英文字母，用大小 128 的整數陣列 cnt[c] 就能 O(1) 查到某字元出現幾次。An array indexed by character; cnt[c] stores how many times character c appears. Size 128 covers all ASCII letters.
雜湊表 / Hash map：鍵值對的容器，可用任意鍵（這裡是字元）快速查值。C++ 的 unordered_map<char,int> 就是一種。A key–value container giving fast lookup; here it maps a character to a count.
不變量 / Invariant：在演算法執行過程中始終保持為真的條件。本題的關鍵不變量是：「視窗向右擴張到剛好覆蓋 t 後，才嘗試從左邊收縮。」A condition that always holds during the loop; here: we only shrink from the left once the window already covers all of t.
formed / required：required 是 t 中不同字元的數量；formed 是目前視窗中已滿足所需次數的不同字元數量。當 formed == required 時，整個視窗就覆蓋了 t。required = number of distinct chars in t; formed = how many of those are currently satisfied in the window. They are equal exactly when the window is valid.

思路

最直覺的暴力法是列舉 s 的所有子字串：固定起點，往右延伸到每一個終點，對每個子字串檢查是否包含 t 的所有字元。但 s 的子字串有 O(m^2) 個，每個又要花時間檢查，總共會是 O(m^2 · n) 或 O(m^2) 等級，當 m 高達 10^5 時必然超時。問題出在我們不斷重複檢查重疊的區間。觀察可知：當一個視窗已經覆蓋 t 時，把右端再往右移只會讓視窗變長、依然覆蓋；而真正能縮短答案的，是在「仍然覆蓋」的前提下把左端往右收。這正是滑動視窗的用武之地。我們用 need[c] 記錄 t 中每個字元還需要幾個，再用 required（t 的不同字元數）和 formed（目前已被滿足的不同字元數）來判斷視窗是否合格。右指針每次吃進一個字元就更新計數；一旦 formed == required，視窗合格，我們就盡量收縮左指針——每收掉一個字元前先看它會不會破壞覆蓋，只要不破壞就繼續收，並隨時記錄遇到的最短合格視窗。因為每個字元最多被右指針進、被左指針出各一次，總共只走兩遍，達成 O(m + n)。關鍵不變量是：只有在視窗已覆蓋 t 時才收縮左端，這保證我們找到的每個「即將失效」的位置都對應一個極小合格視窗。

The brute force lists every substring of s: fix a start, extend to every end, and check each candidate against t. With O(m^2) substrings and a check on each, that's far too slow for m up to 10^5 — and it wastes work re-checking overlapping ranges. The key insight: once a window already covers t, pushing the right edge further only lengthens it; the only way to shorten the answer is to pull the left edge inward while the window stays valid. That's a sliding window. Keep need[c] = how many of character c are still required from t, plus required (the count of distinct characters in t) and formed (how many of those are currently satisfied). Each step the right pointer absorbs one character and updates counts; the moment formed == required the window is valid, so we greedily shrink from the left — before dropping each character we check it won't break coverage, and we record the shortest valid window seen so far. Every character enters via right once and leaves via left at most once, so we make two linear passes overall: O(m + n). The governing invariant is that we shrink the left edge only when the window already covers t, guaranteeing each "about to break" position corresponds to a locally minimal valid window.

逐步走查 / Walkthrough

以 s = "ADOBECODEBANC", t = "ABC" 為例。required = 3（不同字元 A、B、C）。need = {A:1, B:1, C:1}。我們用 have[c] 記錄目前視窗中各字元數量，formed 記錄已滿足的不同字元數。下標從 0 開始。

For s = "ADOBECODEBANC", t = "ABC": required = 3, need = {A:1, B:1, C:1}. have[c] tracks counts inside the window, formed counts satisfied distinct chars.

right	進入字元 / char in	視窗 / window `[left,right]`	formed	動作 / action	目前最佳 / best so far
0	A	`[0,0]`="A"	1	A 滿足 / A satisfied	—
1	D	`[0,1]`="AD"	1	D 不需要 / not needed	—
2	O	`[0,2]`="ADO"	1	—	—
3	B	`[0,3]`="ADOB"	2	B 滿足 / B satisfied	—
4	E	`[0,4]`="ADOBE"	2	—	—
5	C	`[0,5]`="ADOBEC"	3	合格！收縮 / valid! shrink	len 6 → best="ADOBEC"
	收 A→left=1 / drop A	`[1,5]`="DOBEC"	2	A 數量降到 0，formed-- / A drops below need	best 仍 / still "ADOBEC"
6	O	`[1,6]`="DOBECO"	2	—	—
7	D	`[1,7]`	2	—	—
8	E	`[1,8]`	2	—	—
9	B	`[1,9]`="DOBECODEB"	2	B 多了一個 / extra B	—
10	A	`[1,10]`="DOBECODEBA"	3	合格！收縮 / valid! shrink	len 10 > 6，不更新
	收 D,O,B,E,C,O,D,E → left=9 / drop until A needed	`[9,10]`="BA"	… 收到 left=9 後遇到 B 仍夠、C 不夠…

收縮細節：從 left=1 開始丟掉 D、O、B、E、C…一旦想丟 C（left 指到 5 的 C）會讓 have[C] 變 0、formed 掉到 2，於是停止收縮，此時視窗是 [5,10]="CODEBA"? 實際上停在剛好還合格的最短位置。 Shrinking detail: dropping non-essential chars is free; the moment removing a char would drop a needed count below need, we stop. We keep going as right advances.

right	進入字元 / char in	視窗 / window	formed	動作 / action	best
11	N	…	…	N 不需要 / not needed	—
12	C	`[…,12]`	3	合格！收縮 / valid! shrink	收縮後得到 "BANC" len 4 < 6 → best="BANC"

最終 best = "BANC"，長度 4，即為答案。 Final best = "BANC", length 4 — the answer.

Solution — C

// 演算法 / Algorithm: 滑動視窗 (sliding window)。
// 右指針擴張視窗直到覆蓋 t；覆蓋後收縮左指針求最短，過程中記錄最短合格視窗。
// Expand right until the window covers t; then shrink from the left to minimize,
// recording the shortest valid window. Each char is visited at most twice → O(m+n).

char* minWindow(char* s, char* t) {
    // 取得兩字串長度 / get the lengths of both strings.
    // strlen 會掃到結尾的 ''，所以先算好避免重複計算 / cache lengths to avoid recomputing.
    int m = 0; while (s[m] != '') m++;   // 手算 s 長度 / count s manually
    int n = 0; while (t[n] != '') n++;   // 手算 t 長度 / count t manually

    // 若 t 比 s 長，s 絕不可能覆蓋 t，直接回傳空字串
    // If t is longer than s, no window can cover it → return "".
    if (n == 0 || m < n) {
        char* empty = (char*)malloc(1);    // 配置 1 byte 放結尾符 / 1 byte for the terminator
        empty[0] = '';                   // C 字串以 '' 結尾 / C strings end with ''
        return empty;
    }

    // need[c]：t 中字元 c 還需要幾個 / how many of char c are still required.
    // have[c]：目前視窗中字元 c 有幾個 / how many of char c are in the window now.
    // 大小 128 可覆蓋所有 ASCII 字母 / size 128 covers all ASCII letters.
    int need[128] = {0};                   // 全部初始化為 0 / initialize all to 0
    int have[128] = {0};

    // 統計 t 中每個字元的需求量 / tally how many of each char t needs.
    // (unsigned char) 轉型避免負索引（若有非 ASCII） / cast guards against negative indexing.
    for (int i = 0; i < n; i++)
        need[(unsigned char)t[i]]++;

    // required：t 中不同字元的種類數 / number of DISTINCT chars in t.
    int required = 0;
    for (int c = 0; c < 128; c++)
        if (need[c] > 0) required++;

    // formed：視窗中「次數已達標」的不同字元數 / distinct chars already satisfied.
    // 當 formed == required 時，視窗就覆蓋了整個 t / window covers t when these are equal.
    int formed = 0;

    int left = 0;                          // 視窗左端 / left edge of window
    int bestLen = m + 1;                   // 最短合格長度，初始設成不可能的大值 / best length, start huge
    int bestStart = 0;                     // 最短視窗的起點 / start index of best window

    // right 從 0 掃到 m-1，逐一把字元加進視窗 / right scans s, absorbing one char each step.
    for (int right = 0; right < m; right++) {
        unsigned char c = (unsigned char)s[right];  // 取出右端字元 / the char entering the window
        have[c]++;                                  // 視窗中該字元數 +1 / its count goes up

        // 若這個字元剛好讓某需求被「填滿」(不多不少)，formed +1
        // If this char's count just reached exactly what t needs, one more char is satisfied.
        if (need[c] > 0 && have[c] == need[c])
            formed++;

        // 視窗已覆蓋 t 時，盡量從左邊收縮以求最短
        // While the window is valid, shrink from the left to minimize it.
        while (formed == required) {
            int curLen = right - left + 1;          // 目前視窗長度 / current window length
            if (curLen < bestLen) {                 // 發現更短的合格視窗就記下來 / record if shorter
                bestLen = curLen;
                bestStart = left;
            }

            unsigned char lc = (unsigned char)s[left]; // 即將被移出視窗的左端字元 / char leaving
            have[lc]--;                                // 視窗中該字元數 -1 / its count drops
            left++;                                    // 左端右移，視窗縮小 / move left edge inward

            // 若移除後該字元數低於需求，視窗不再合格，formed -1，迴圈結束
            // If removing it drops below the need, the window is no longer valid → stop shrinking.
            if (need[lc] > 0 && have[lc] < need[lc])
                formed--;
        }
    }

    // 從沒找到合格視窗 → bestLen 仍是初始值 → 回傳空字串
    // Never found a valid window → return "".
    if (bestLen == m + 1) {
        char* empty = (char*)malloc(1);
        empty[0] = '';
        return empty;
    }

    // 配置 bestLen+1 個 byte（多 1 個放結尾 ''）/ allocate bestLen+1 bytes (extra for '').
    char* result = (char*)malloc(bestLen + 1);
    for (int i = 0; i < bestLen; i++)
        result[i] = s[bestStart + i];      // 從 s 複製最短視窗的字元 / copy the best window out
    result[bestLen] = '';                // 補上結尾符，使其成為合法 C 字串 / terminate the string
    return result;                         // 呼叫者負責 free / caller frees this buffer
}

Solution — C++

// 演算法 / Algorithm: 滑動視窗 (sliding window)，與 C 版相同邏輯。
// 用兩個 unordered_map 存「需求」與「視窗現況」，formed/required 判斷是否覆蓋 t。
// Expand right to cover t, shrink left to minimize, track the shortest valid window.
// Each character is processed at most twice → O(m + n).

#include <string>
#include <unordered_map>
#include <climits>   // INT_MAX
using namespace std;

class Solution {
public:
    string minWindow(string s, string t) {
        // 邊界：t 為空或比 s 長則不可能覆蓋 / impossible cases → empty result.
        if (t.empty() || s.size() < t.size()) return "";

        // unordered_map<char,int>：以字元為鍵、次數為值的雜湊表 / hash map char → count.
        // need 記錄 t 各字元需求量 / how many of each char t requires.
        unordered_map<char, int> need;
        for (char c : t) need[c]++;        // range-for：逐一遍歷 t 的字元 / iterate over t's chars

        unordered_map<char, int> have;     // 視窗目前各字元數量 / counts inside the window

        int required = (int)need.size();   // t 的不同字元種類數 / number of distinct chars in t
        int formed = 0;                    // 已滿足需求的不同字元數 / satisfied distinct chars

        int left = 0;                      // 視窗左端 / left edge
        int bestLen = INT_MAX;             // 最短合格長度，初始為極大值 / best length so far
        int bestStart = 0;                 // 最短視窗起點 / start of best window

        // right 為視窗右端，逐字擴張 / right edge advances one char at a time.
        for (int right = 0; right < (int)s.size(); right++) {
            char c = s[right];             // 進入視窗的字元 / entering char
            have[c]++;                     // 視窗中該字元數 +1 / bump its count

            // 若該字元次數剛好達標，formed +1
            // need.count(c) 檢查 c 是否是 t 需要的字元 / does t need this char?
            if (need.count(c) && have[c] == need[c])
                formed++;

            // 視窗覆蓋 t 時，從左邊盡量收縮 / shrink while the window stays valid.
            while (formed == required) {
                int curLen = right - left + 1;   // 目前視窗長度 / current length
                if (curLen < bestLen) {          // 更短就更新最佳解 / update best if shorter
                    bestLen = curLen;
                    bestStart = left;
                }

                char lc = s[left];               // 即將移出的左端字元 / char leaving the window
                have[lc]--;                      // 該字元數 -1 / drop its count
                left++;                          // 左端右移 / move left inward

                // 移除後低於需求 → 視窗失效 → formed -1，停止收縮
                // If it drops below the need, the window breaks → stop shrinking.
                if (need.count(lc) && have[lc] < need[lc])
                    formed--;
            }
        }

        // 找不到合格視窗則回空字串；否則用 substr 切出最短視窗
        // No valid window → ""; otherwise slice it out with substr(start, length).
        return bestLen == INT_MAX ? string("") : s.substr(bestStart, bestLen);
    }
};

複雜度 / Complexity

Time: O(m + n) — n 來自統計 t 的字元頻率（掃一遍 t）。主迴圈中 right 從頭到尾走一遍 s（m 次），而 left 全程也最多前進 m 次（每個字元最多被移出一次），所以視窗操作合計 O(m)。雖然有巢狀 while，但 left 不會倒退，總移動次數受 m 限制，並非 O(m^2)。n is from counting t; the main loop advances right across s once and left at most m times total (each char leaves once), so the nested while is amortized linear, not quadratic.
Space: O(m + n)（或視字符集為 O(1)）— C 版用固定大小 128 的兩個陣列，是常數空間 O(1)；C++ 版的 unordered_map 最多存下 t 的不同字元（need）與視窗中出現過的不同字元（have），故 O(字元種類數)，對固定字母表同樣可視為 O(1)。輸出字串需 O(m)。The count structures hold at most the distinct characters (a fixed alphabet → effectively O(1)); the output string takes O(m) in the worst case.

Pitfalls & Edge Cases

重複字元 / Duplicate characters in t：t="aa" 需要兩個 a。如果只用「集合」記錄需求而忽略次數，就會錯判 s="a" 為合格。本解用 need[c] 存次數並比對 have[c] == need[c] 才算滿足，正確處理重複。Using a set instead of counts would wrongly accept too few copies; we store counts and require have[c] == need[c].
formed 只在「剛好達標」時 +1 / Increment formed only on exact match：條件寫 have[c] == need[c] 而非 >=。若用 >=，同一字元的多餘出現會重複加 formed，導致 formed 超過 required，邏輯錯亂。Using >= would over-count formed on extra occurrences; == fires exactly once per character type.
收縮時機 / When to shrink：只有在 formed == required（視窗已合格）時才收縮左端。先擴張到合格、再收縮，才能保證每個記錄的視窗都覆蓋 t。Shrinking before the window is valid would record invalid windows.
回傳空字串而非 NULL / Return "", not NULL：C 版找不到時要 malloc(1) 放一個 '\0'；直接回傳未初始化指標或 NULL 會出錯。Always return a valid empty string buffer in C.
C 字串結尾 / Null terminator in C：複製視窗後務必補 result[bestLen] = '\0'，否則印出時會越界讀取垃圾資料。Forgetting the terminator causes out-of-bounds reads.
bestLen 初始值 / Initial bestLen：設成 m+1（C）或 INT_MAX（C++）這種「不可能達到」的大值，用來判斷是否從未找到答案。A sentinel large value distinguishes "no window found" from a real result.
索引型別與負值 / Indexing safety：C 版以 (unsigned char) 轉型再當索引，避免某些平台上 char 為帶號型別時產生負索引。Casting to unsigned char prevents negative array indices.