28. Find the Index of the First Occurrence in a String

題目 / Problem

中文： 給定兩個字串 needle 和 haystack，回傳 needle 在 haystack 中第一次出現的索引；如果 needle 不是 haystack 的一部分，回傳 -1。

English: Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Constraints / 限制： - 1 <= haystack.length, needle.length <= 10^4 - haystack 和 needle 只包含小寫英文字母 / both consist of only lowercase English letters.

Example / 範例：

Input:  haystack = "sadbutsad", needle = "sad"
Output: 0
Explanation: "sad" 出現在索引 0 和 6，第一次出現的是 0。
             "sad" appears at indices 0 and 6; the first one is 0.

名詞解釋 / Glossary

子字串 / substring：一個字串中連續的一段字元。例如 "abc" 是 "xabcy" 的子字串，但 "axc" 不是（必須連續）。A contiguous slice of characters inside a larger string.
索引 / index：字串中字元的位置編號，從 0 開始。"sad"[0] 是 's'。Zero-based position of a character.
雙指針 / two pointers：用兩個整數變數分別追蹤兩個位置（這裡是 haystack 與 needle 的當前位置），透過比較與移動來解題。Two integer variables tracking positions in one or two arrays/strings; they advance based on the comparison logic.
暴力法 / brute force：直接列舉所有可能的起始位置，逐一驗證。簡單但通常較慢。Try every possible starting position and verify; simple but often slow.
strlen / string length：C 標準函式，回傳以 '\0' 結尾的字串長度，需要掃過整個字串，是 O(n) 操作，所以我們把它存進變數而不是反覆呼叫。strlen walks the string to find the null terminator; cache it instead of calling it inside loops.
std::string::substr / 取子字串：C++ 中 s.substr(i, len) 回傳從 i 開始長度為 len 的子字串，會分配新記憶體。Allocates a fresh substring of given length starting at index i.

思路

這題本質上是「在大字串中找小字串」，也就是經典的字串匹配問題。最直觀的暴力法是：對 haystack 的每一個可能起點 i，去比較 haystack[i..i+m-1] 是否等於 needle（m 是 needle 的長度）。如果一路相等就回傳 i，否則 i 往後挪一格繼續試。一個小但重要的優化是：i 只需要枚舉到 n - m（n 是 haystack 長度），因為從更後面開始的話，剩餘字元根本不夠長，不可能匹配，這也順便避免越界讀取。雖然這個方法最壞情況是 O(n * m)，但在本題 n, m <= 10^4 的限制下，加上字串只有小寫字母，匹配通常很快就會失敗並提早跳出，實際表現完全足夠。更進階的 KMP / Z-function 可以做到 O(n + m)，但對初學者來說，先掌握雙指針暴力法的正確性與邊界條件更重要。

The problem is classic substring search: find where a small string needle first appears inside a larger string haystack. The brute-force two-pointer approach is to try every possible starting index i in haystack and check whether the next m characters match needle character by character. The key boundary observation is that i only needs to range over 0..n-m inclusive — beyond that there aren't enough characters left to fit needle, and stopping early also prevents reading past the end of the buffer. As soon as any character mismatches, we break out of the inner loop and slide i forward by one. Worst-case complexity is O(n * m), but with n, m <= 10^4 and a lowercase alphabet, mismatches happen fast and the algorithm is plenty fast in practice. More advanced linear-time algorithms (KMP, Z-function, Rabin-Karp) exist, but the two-pointer scan is the right first tool because it makes the invariants — "we've matched j characters so far starting at i" — easy to see.

逐步走查 / Walkthrough

用範例 haystack = "sadbutsad"、needle = "sad"（n = 9, m = 3）。外層指針 i 從 0 到 n - m = 6；內層指針 j 從 0 到 m - 1 比對字元。

`i`	比對過程 / inner comparison	結果 / result
0	`h[0]='s'==n[0]='s'` ✓；`h[1]='a'==n[1]='a'` ✓；`h[2]='d'==n[2]='d'` ✓	全部匹配 → 回傳 `0` / full match, return `0`

第一次嘗試就成功，函式立刻回傳 0。/ The first attempt matches completely, so we return 0 immediately.

如果改成 needle = "but"，過程會是這樣 / If needle were "but" instead:

`i`	比對 / comparison	結果
0	`h[0]='s'` vs `n[0]='b'` ✗	不匹配，`i++`
1	`h[1]='a'` vs `n[0]='b'` ✗	不匹配，`i++`
2	`h[2]='d'` vs `n[0]='b'` ✗	不匹配，`i++`
3	`h[3]='b'`✓，`h[4]='u'`✓，`h[5]='t'`✓	全部匹配 → 回傳 `3`

Solution — C

// 演算法 / Algorithm:
// 用雙指針暴力搜尋：外層 i 枚舉 haystack 的每一個可能起點，
// 內層 j 比較 needle 的每個字元；若全部相等則回傳 i，否則繼續。
// Brute-force two-pointer scan: outer i picks a start in haystack,
// inner j compares characters with needle; return i on full match.

#include <string.h>  // 為了使用 strlen / for strlen

int strStr(char* haystack, char* needle) {
    int n = (int)strlen(haystack);   // n 是 haystack 長度，先存起來避免重複計算 / cache haystack length, strlen is O(n)
    int m = (int)strlen(needle);     // m 是 needle 長度 / cache needle length

    // 邊界：若 needle 比 haystack 長，不可能匹配；直接回傳 -1
    // Edge case: needle longer than haystack ⇒ impossible match, return -1
    if (m > n) return -1;

    // 外層 i 是 haystack 中嘗試匹配的起點；只需到 n - m 即可
    // Outer i is the candidate start in haystack; only goes up to n - m
    // 因為從更後面開始剩餘字元不夠 m 個，不可能匹配，也避免越界
    // because past that, fewer than m chars remain — impossible to match, and avoids out-of-bounds
    for (int i = 0; i <= n - m; i++) {
        int j = 0;                   // j 是 needle 內的比對指針，從 0 開始 / j scans needle from 0
        // 內層比對：只要 j 還沒走完且字元相等就繼續
        // Inner compare: advance while within needle AND chars match
        while (j < m && haystack[i + j] == needle[j]) {
            j++;                     // 多匹配了一個字元，j 前進 / one more char matched, advance j
        }
        if (j == m) {                // j 走到底 ⇒ 所有 m 個字元都匹配 / j reached m ⇒ full match
            return i;                // 回傳這個起點 / return this starting index
        }
        // 否則：內層中途遇到不匹配，外層 i 自動 +1 再試下一個起點
        // Otherwise: mismatch happened, outer loop increments i and retries
    }

    return -1;                       // 全部起點都試過仍沒找到 / tried all starts, no match
}

Solution — C++

// 演算法 / Algorithm:
// 與 C 版相同：雙指針暴力搜尋，外層枚舉起點，內層比對字元。
// Same as the C version: brute-force two-pointer scan.
// 這裡使用 std::string 的 .size() 與 operator[] 取代 strlen / 指針運算。
// We use std::string's .size() and operator[] instead of strlen / raw pointers.

class Solution {
public:
    int strStr(string haystack, string needle) {
        int n = (int)haystack.size();   // .size() 是 O(1)，回傳字元數 / .size() is O(1), returns char count
        int m = (int)needle.size();     // 同上，存進 int 方便做減法 / store as int to allow subtraction safely

        if (m > n) return -1;           // needle 太長不可能匹配 / needle longer than haystack ⇒ no match

        // 外層 i：嘗試把 needle 對齊到 haystack 的位置 i
        // Outer i: align needle's start with haystack[i]
        // i <= n - m 確保 haystack[i..i+m-1] 不會越界
        // i <= n - m guarantees haystack[i..i+m-1] stays in bounds
        for (int i = 0; i <= n - m; ++i) {
            int j = 0;                  // 內層指針，比對 needle 字元 / inner pointer over needle
            // operator[] 取得指定索引的字元，不做邊界檢查（速度快）
            // operator[] returns the char at that index; no bounds check (fast)
            while (j < m && haystack[i + j] == needle[j]) {
                ++j;                    // 前置遞增，C++ 中習慣寫 ++j 而非 j++ / idiomatic pre-increment in C++
            }
            if (j == m) {               // 全部 m 個字元都對上 / matched all m chars
                return i;               // 回傳起點 / return start index
            }
        }

        return -1;                      // 找不到 / not found
    }
};

複雜度 / Complexity

Time: O(n · m) — 外層最多跑 n - m + 1 次，每次內層最壞情況比對 m 個字元。最壞情況（例如 haystack = "aaaa...aab"、needle = "aaab"）會接近 n · m，但實際上一旦不匹配就會立即跳出，平均很快。Outer loop runs up to n - m + 1 iterations and the inner compare is up to m chars; worst case is n · m, but mismatches usually break early.
Space: O(1) — 只用了幾個整數變數 i, j, n, m，沒有額外配置陣列或雜湊表。Only a handful of integer variables; no extra arrays or maps.

Pitfalls & Edge Cases

外層上界寫錯 / Wrong outer bound：若寫成 i < n 而不是 i <= n - m，當 i 太大時 haystack[i + j] 會讀到字串結尾甚至越界，可能讀到垃圾資料或 segfault。本題用 i <= n - m 保證安全。Using i < n instead of i <= n - m reads past the end of haystack.
m > n 的情況 / Needle longer than haystack：若不先檢查，n - m 在 C 的 int 算術下會變成負數，迴圈直接不跑剛好回傳 -1；但顯式檢查讓意圖更清楚也避免依賴運氣。Explicit if (m > n) return -1; documents intent.
重複呼叫 strlen / Calling strlen in a loop：strlen 是 O(n)，每次迴圈呼叫會把總複雜度推到 O(n² · m)。先存進變數。Cache the length once.
匹配成功時要立即回傳第一個 / Return on first match：題目要求第一次出現的索引，所以一找到就 return，不要繼續往後找。Return immediately on first hit; don't keep scanning.
j == m 判斷而非 j == m - 1 / Check j == m, not j == m - 1：當 j 等於 m 時，代表 j 已經比對完最後一個字元（索引 m-1）並 ++ 過了一次，這才是完整匹配。Off-by-one: full match means j reached m, not m - 1.