THU.JUN.18
2026
23:37:13
← back to modules MODULE · 03 · HOMESTREAM
0 / 10 chapters complete · 0%

The Database — Designing the Media Table

Time to design the brain of HomeStream. We'll build a single media table that knows about every audio and video file you own — paths, titles, durations, sizes. Get this right and the next eight chapters fall into place easily.
One media table. Path as UNIQUE so we don't double-import. type as an ENUM. Big enough integer for file size. utf8mb4 for the names. Indexes on the columns you'll filter and search by.

The schema-design mindset

Before we write any SQL, let's talk about the actual job of this table for thirty seconds. The table answers questions like "what's in my library?", "show me only the videos", "find anything with 'jazz' in the title", and "where on disk does file ID 42 actually live?" If we keep those four use cases in our head while designing, the column choices basically pick themselves.

Database design is one of those areas where 5 minutes of thinking up front saves 5 hours of regret later. The good news: for a personal project at this scale, "good enough" is genuinely good enough. We don't need separate tables for artists and albums and tracks like Spotify does. One table, denormalized, works fine for thousands of files. If you ever scale this to "I have 200,000 tracks", we'll talk again — but probably you won't.

Walking through the schema

Here's the full SQL we're about to write. Skim it first, then I'll explain every column choice.

CREATE DATABASE IF NOT EXISTS homestream
    CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

USE homestream;

CREATE TABLE media (
    id           INT PRIMARY KEY AUTO_INCREMENT,
    path         VARCHAR(1024) NOT NULL,
    type         ENUM('audio','video') NOT NULL,
    title        VARCHAR(255) NOT NULL,
    artist       VARCHAR(255) NULL,
    album        VARCHAR(255) NULL,
    duration_s   INT NULL,
    size_b       BIGINT NULL,
    mime         VARCHAR(100) NULL,
    has_thumb    TINYINT(1) NOT NULL DEFAULT 0,
    added_at     DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    UNIQUE KEY uq_path (path),
    INDEX idx_type (type),
    INDEX idx_title (title),
    INDEX idx_artist (artist)
);

Decoding the column choices

Let's go through these one at a time because every choice has a reason. id is the boring obvious primary key — auto-incremented integer, the unique handle we'll use everywhere else (in URLs like /stream.php?id=42, in foreign keys someday, in the queue feature later). Never skip the id column on a table even if you think you have a "natural" unique key. Future-you wants it.

path is where the file actually lives on disk — something like /home/erictey/media/music/Daft Punk/Discovery/01 One More Time.mp3. We're using VARCHAR(1024) because paths can get long, especially for nested folders and long titles. The UNIQUE KEY on path is doing real work: it stops the scanner from importing the same file twice. When we INSERT IGNORE during a scan, the database silently skips duplicates because the unique constraint catches them. Free deduplication.

type is an ENUM with two values. ENUMs are a controversial SQL feature — some developers hate them because changing the values later requires an ALTER TABLE. For our case, that's fine; we genuinely only have audio and video. If you wanted to be paranoid you could use VARCHAR(10) with a CHECK constraint instead. Either works. I'll use ENUM here because the values are stable and it's slightly more efficient on disk.

🐍 Python: SQL ENUMs are kind of like Python's enum.Enum but enforced at the database level. You'd model the matching Python (or PHP) side with an actual enum so type-checking catches typos.

title, artist, album are all VARCHAR(255). Why 255? Because that used to be a meaningful boundary in older MySQL versions (varchar columns ≤255 used 1 byte for length, longer ones used 2). It doesn't really matter anymore but the convention stuck. artist and album are nullable because not every file has them — videos don't have artists, some MP3s have broken or missing tags. NULL means "we don't know," which is different from empty string "" (which would mean "we explicitly know it's blank"). Use NULL when you mean unknown.

duration_s stores seconds as an integer. We could store milliseconds for more precision, but for a media server, second-resolution is fine. Notice it's nullable — we won't know the duration until the metadata-extraction step runs in chapter 7. Files will exist in the table with NULL duration for a while; that's by design.

size_b stores file size in bytes. Critical detail: this is BIGINT, not INT. A regular INT in MariaDB is 4 bytes signed, which maxes out at about 2.1 billion. That's only about 2 GB. Plenty of movies are bigger than that. BIGINT is 8 bytes signed, max ~9.2 quintillion bytes, which is enough for every movie ever made plus your imagination. Always use BIGINT for file sizes. Burn this lesson in — using INT for file sizes is a classic real-world bug.

mime is the MIME type ("audio/mpeg", "video/mp4", etc.). We'll need this for the streaming endpoint to tell the browser what kind of file it's sending. We could detect it on every request, but storing it once at scan time is faster.

has_thumb is a boolean (TINYINT(1) is the MariaDB convention for boolean — there's no real BOOL type, it's just an alias). We'll flip it to 1 after we generate a thumbnail for a video. Lets the library page know whether to show a poster image or a fallback.

added_at uses DATETIME with a default of CURRENT_TIMESTAMP. This means: "every time you insert a row without specifying added_at, fill in the current date and time." We'll use this for the "recently added" feature in the polish chapter. Set it and forget it.

The indexes — and why each one matters

Indexes are the fastest cheap performance win in databases. Without them, every query that filters by a column has to scan the entire table. With them, the database can find matching rows almost instantly. For a small library it doesn't matter much — but build the habit now and you'll never have a query mysteriously slow down at scale.

We added four indexes. The UNIQUE KEY on path serves double duty as both a constraint (preventing dupes) and an index (lookups by path are fast). The INDEX on type speeds up "show me only the audio" filtering — by far our most common query. The INDEX on title speeds up alphabetical sorting and title search. The INDEX on artist speeds up "show me all the Daft Punk" type queries.

Could you skip the indexes and still have it work? Sure, with a thousand files. But "add indexes for columns you actually filter and sort by" is one of those tiny habits that takes 10 extra seconds and pays off for years.

Time to actually create this. SSH into Lubuntu (or open a terminal in VS Code's remote session) and:

  1. sudo mariadb — opens the MariaDB prompt as root.
  2. Paste the entire CREATE DATABASE + CREATE TABLE block from above. Hit enter after the closing semicolon.
  3. SHOW TABLES; — should list "media" alone.
  4. DESCRIBE media; — should print every column with its type. Read down the list; verify each one matches what we discussed.
  5. SHOW INDEX FROM media; — should show four indexes including PRIMARY (the id) and uq_path (the unique constraint).

Now we need a database user for the app to connect as — same principle as the medtrack_user from Part 2. Never have a web app connect to the database as root.

Still at the MariaDB prompt:

CREATE USER 'homestream_user'@'localhost' IDENTIFIED BY 'pick-a-strong-password';
GRANT SELECT, INSERT, UPDATE, DELETE ON homestream.* TO 'homestream_user'@'localhost';
FLUSH PRIVILEGES;
EXIT;

Pick a real password and save it in your password manager. We'll use it in the next chapter when we connect from PHP.

Build: The Connection Helper + Smoke Test

Goal: write the db() connection helper for HomeStream, then prove the connection works by querying our brand-new (and empty) media table. This is basically the same db helper from Part 2, but pointed at the new database.

I know the connection helper looks like boring boilerplate, but it's the thing every other file in HomeStream is going to call. Writing it once, well, means we never think about it again. That's a lovely feeling.

  1. Create /home/erictey/server/homestream/lib/db.php.
  2. Paste this in (sub in your real password):
    <?php
    declare(strict_types=1);
    
    function db(): PDO {
        static $pdo = null;
        if ($pdo === null) {
            $pdo = new PDO(
                'mysql:host=127.0.0.1;dbname=homestream;charset=utf8mb4',
                'homestream_user',
                getenv('DB_PASS') ?: 'your-password-here',
                [
                    PDO::ATTR_ERRMODE            => PDO::ERRMODE_EXCEPTION,
                    PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC,
                    PDO::ATTR_EMULATE_PREPARES   => false,
                ]
            );
        }
        return $pdo;
    }
  3. Create a tiny smoke-test file /home/erictey/server/homestream/public/db-test.php:
    <?php
    declare(strict_types=1);
    require __DIR__ . '/../lib/db.php';
    
    try {
        $count = (int) db()->query("SELECT COUNT(*) FROM media")->fetchColumn();
        echo "✓ Connected to homestream. Media count: $count";
    } catch (Throwable $e) {
        echo "✗ Failed: " . $e->getMessage();
    }
  4. We need Apache to actually serve from this new folder. The simplest way for now is just to bookmark the URL through the existing server alias. Hit http://192.168.0.19/server/homestream/public/db-test.php in your browser.
  5. You should see "✓ Connected to homestream. Media count: 0". That zero is everything — it means the connection worked AND the media table exists AND we successfully queried it. The empty count makes sense; we haven't scanned anything yet.

If it failed, the error tells you exactly what's wrong. "Access denied" = wrong password. "Unknown database" = you forgot to run CREATE DATABASE. "Could not find driver" = you skipped php-mysql in Part 1. Each error message points at one specific fix.

Stretch goals if you're feeling extra:

  • Insert a fake row by hand at the MariaDB prompt with INSERT INTO media (path, type, title) VALUES ('/fake/song.mp3', 'audio', 'Test Song');, then refresh the smoke test — the count should be 1.
  • Try inserting the same path twice in a row. The unique constraint should reject the second one with a clear error.
  • Look at SHOW VARIABLES LIKE 'character_set%' in MariaDB and notice all the utf8mb4 entries — proof that the Unicode story is working end to end.

What you flexed: SQL schema design with thoughtful column types, ENUM vs VARCHAR, BIGINT for file sizes (the classic real-world gotcha), index strategy, database user with least privilege, PDO connection helper using the static-cache pattern. That's a solid day's work, and we haven't even loaded a single file yet.