THU.JUN.18
2026
23:34:29
← back to modules MODULE · 02 · PHP PART 2
0 / 10 chapters complete · 0%

Strings — More Functions Than You Need

Strings are everywhere — every form input, every database value, every API response, every piece of text shown to a user. PHP has roughly 80 string functions in its standard library. Realistically you'll use maybe 15 in everyday work. This chapter is that working set, plus one critical security function that prevents the most common web bug in history.
Length, case, trim, search, replace, substring, split/join, sprintf for formatting. Use the mb_* versions for user input. Always run user data through htmlspecialchars before outputting to HTML. Regex when nothing simpler works.

The mental note about bytes vs characters

Before we dive into functions, one quick conceptual thing that matters a lot. Strings in PHP are sequences of bytes. ASCII characters take 1 byte each, but UTF-8 (which handles emoji and non-Latin scripts) uses 1 to 4 bytes per character. This distinction matters more than you'd expect, and we'll keep coming back to it.

Length, case, trim

The most basic operations — how long is this string, change its case, remove whitespace:

strlen("hello");                 // 5
mb_strlen("héllo");              // 5 — multibyte-safe (use for user input)

strtolower("HELLO");             // hello
strtoupper("hello");             // HELLO
ucfirst("hello world");          // Hello world
ucwords("hello world");          // Hello World

trim("  hello  ");               // hello
trim("--hello--", "-");          // hello (custom chars to trim)

Important rule that catches everyone: any string that came from a user is potentially UTF-8 multi-byte. Use the mb_ versions (mb stands for "multibyte") instead of the plain versions for user content. Otherwise emoji and non-Latin scripts break in subtle, hard-to-debug ways.

Here's the trap in action: a username "héllo" (with an é, which is 2 bytes in UTF-8). strlen() returns 6 (bytes). mb_strlen() returns 5 (actual characters). If your validation says "username must be ≤ 20 characters" and uses strlen, a user with an accented name has a different effective limit than someone with plain ASCII. Inconsistent, weird, and you'll get a bug report from one specific user that you can't reproduce. Use mb_* for anything that touched the outside world.

🐍 Python: Python 3 strings are Unicode by default — no mb_* dance required. len("héllo") just returns 5. PHP has this two-tier API because PHP strings are byte sequences, not characters. One of the rare wins for Python's design.

Search and replace

Finding stuff in strings, replacing pieces:

str_contains("hello world", "world");        // true   (PHP 8+)
str_starts_with("hello", "he");              // true
str_ends_with("file.php", ".php");           // true

strpos("hello world", "world");              // 6 (position) or false
strrpos("hello/world/test", "/");            // 11 — LAST occurrence (r = reverse)

str_replace("world", "Eric", "hello world"); // "hello Eric"
str_replace(["a","b"], ["1","2"], "abc");    // "12c"

Quick history note: before PHP 8, there was no str_contains. People had to write strpos(...) !== false — which works but is awkward, because strpos returns the position (an int) or false (not found), and you had to be careful with type comparisons (returning 0 for "found at position 0" can look false-ish if you used != instead of !==). PHP 8 finally added the clean version. Use str_contains when you mean "does this contain that," and reach for strpos only when you actually need the position.

Substring

substr("hello world", 6);          // "world"
substr("hello world", 0, 5);       // "hello"
substr("hello", -2);               // "lo" — negative = from the end

The negative-index trick is super handy. substr($s, -3) means "last 3 characters." Saves you from calculating string length first.

Split and join — explode and implode

Two of the most common operations, and they're inverses of each other:

explode(",", "a,b,c");                       // ['a', 'b', 'c'] — string → array
explode(",", "a,b,c", 2);                    // ['a', 'b,c'] — limit splits

implode(", ", ['a', 'b', 'c']);              // "a, b, c" — array → string
implode(['a', 'b', 'c']);                    // "abc" — no separator

str_split("hello", 2);                       // ['he', 'll', 'o'] — split by chunks

Memorize: explode takes a string apart, implode glues an array back together. They're a pair.

Padding and formatting

Useful when generating fixed-width output, formatted money, log messages, etc:

str_pad("7", 3, "0", STR_PAD_LEFT);          // "007" — zero-padded
str_repeat("-", 20);                         // "--------------------"

number_format(1234567.891, 2);               // "1,234,567.89" — thousands separators
number_format(1234567.891, 2, ",", " ");     // "1 234 567,89" — custom separators

sprintf("Hello %s, you are %d.", "Eric", 25);    // "Hello Eric, you are 25."
sprintf("%05.2f", 3.1);                          // "03.10" — width 5, zero-padded, 2 decimals

sprintf deserves a closer look — it's incredibly useful and shows up in basically every codebase. It uses C-style format strings. The placeholders you'll memorize first:

  • %s — substitute a string
  • %d — substitute an integer
  • %f — substitute a float
  • %05.2f — float, width 5, zero-padded, 2 decimal places

Memorize %s, %d, %f and you've covered 90% of cases.

HTML escaping — the security rule

Pay close attention to this section. It's one of the most important security rules in web programming, and the most common bug class in PHP apps that don't follow it.

Any time you output user-provided data into HTML, you have to "escape" it — convert special HTML characters into safe equivalents. Without escaping, a user can inject HTML or JavaScript into your page, which is called Cross-Site Scripting (XSS). XSS can do everything from stealing other users' session cookies to silently rewriting your page contents.

$user_input = "<script>alert('xss')</script>";

echo $user_input;                            // ☠ XSS vulnerability!
                                             // Browser sees a script tag, runs it.

echo htmlspecialchars(
    $user_input,
    ENT_QUOTES | ENT_SUBSTITUTE,
    'UTF-8'
);
// Output: &lt;script&gt;alert(...)&lt;/script&gt;
// Browser sees escaped characters and displays them as text.

htmlspecialchars converts the dangerous characters (<, >, ", ', &) into HTML entities. The browser renders those entities as the original visible characters, but they're no longer treated as HTML syntax.

The rule, capitalized for emphasis: run user input through htmlspecialchars before putting it in HTML output. No exceptions. Ever. This single habit prevents 95% of XSS bugs.

Since you'll do this constantly, wrap it in a tiny helper called e() (you already did this in the Functions chapter). You'll see it used in every example in the rest of this course:

function e(string $s): string {
    return htmlspecialchars($s, ENT_QUOTES | ENT_SUBSTITUTE, 'UTF-8');
}

echo "Hello, " . e($user_name) . ".";
// or in templates:
echo "<h1>Hello, <?= e($user_name) ?></h1>";

🐍 Python: htmlspecialchars ≈ Python's html.escape() (stdlib) or Jinja2's auto-escape. Same XSS prevention story everywhere. Jinja2 auto-escapes by default which is safer; PHP makes you call it manually (hence the e() helper trick).

Regular expressions — when nothing simpler works

Regular expressions ("regex" or "regexp") are mini-languages for describing patterns in text. They're a power tool — incredibly useful for input validation and complex text matching, but easily abused for things that should be simpler.

preg_match('/^[a-z0-9_]+$/i', $username);         // 1 if matches, 0 if not
preg_match_all('/\d+/', "abc 12 de 34 f", $m);    // find all numbers
preg_replace('/\s+/', ' ', $text);                // collapse whitespace
preg_split('/[\s,;]+/', $csv);                    // split on multiple delimiters

A few quick syntax notes on PHP regex. The pattern is wrapped in delimiters — usually / but you can use anything (#pattern#, ~pattern~). Use a non-slash delimiter when the pattern itself contains literal slashes (URLs, paths). After the closing delimiter, you can add flags — i for case-insensitive, m for multiline, etc.

Regex is genuinely powerful but easy to misuse. Don't reach for it when a simple str_contains would do the job — regex is more expensive to run and much harder to read. But for input validation patterns (email shapes, slug formats, phone numbers), it's exactly the right tool.

Analogy: regex is like a chainsaw. Great for cutting trees. Don't use it to slice bread.

Multi-byte aware versions — recap

One more time, because it matters. Whenever you process user input that might contain non-ASCII characters, use the mb_* versions:

mb_strlen($s)            // not strlen
mb_substr($s, 0, 10)     // not substr
mb_strtolower($s)        // not strtolower
mb_strpos($s, $needle)   // not strpos

The plain versions count bytes. With UTF-8, one character can be 1–4 bytes. Funny when you notice it in a tutorial. Less funny when it silently breaks your "username must be ≤ 20 characters" validation and locks out a real user with an accented name.

Habit: if a string came from outside (user, file, network), use mb. If you generated it yourself in pure ASCII (a hardcoded error message), plain versions are fine and a hair faster. When in doubt, mb.

Build: Add Search to the Meds List

Quick build for this chapter — add a search box to your meds page that filters items by name, case-insensitively, supporting fuzzy substring matches. Real-feeling search UI from minimal code.

  1. Open lib/meds-helpers.php and add a search function:
    function search_meds(array $meds, string $query): array {
        $query = mb_strtolower(trim($query));
        if ($query === '') return $meds;
    
        return array_filter($meds, function($med) use ($query) {
            return str_contains(mb_strtolower($med['name']), $query);
        });
    }
  2. In meds.php, add to the pipeline (after sort and filter):
    $meds = search_meds($meds, $_GET['q'] ?? '');
  3. Add a search form at the top of the page body:
    <form>
      <input name="q" value="<?= e($_GET['q'] ?? '') ?>" placeholder="search meds...">
      <button>Search</button>
      <input type="hidden" name="filter" value="<?= e($_GET['filter'] ?? 'all') ?>">
      <input type="hidden" name="sort"   value="<?= e($_GET['sort'] ?? 'name') ?>">
    </form>
  4. Try searching: ?q=vit should match "Vitamin D" and "Multivitamin." Case-insensitive, substring match.
  5. Notice that search, filter, and sort all work together via URL params. Bookmarkable, sharable state.

Stretch goals:

  • Highlight the matched text in the results using str_ireplace to wrap matches in <mark> tags.
  • Show a "no results" message when the search returns empty.
  • Add a "clear search" link if query is non-empty.

What you flexed: mb_strtolower for proper Unicode handling, str_contains for substring matching, closures with use() capture, preserving filter/sort state via hidden inputs, the e() helper for safe output. This is what real production search forms look like.