Metadata — Real Titles, Artists, and Durations
What's an ID3 tag, actually?
Quick crash course in case "ID3" is mystery jargon for you. Audio files (especially MP3) have a chunk at the very start or end of the file that holds metadata — the title, the artist, the album, the year, sometimes the album art. It's called the "ID3 tag" and it's been around forever. iTunes wrote them, Spotify reads them, every music player on earth respects them. They're how your phone knows the song name even though the file is called "track01.mp3."
So when we say "extract metadata," we mean: open the audio file, find the ID3 chunk, parse out the structured fields, save them into our database. Then the library page can display "One More Time" instead of "01 - One More Time."
For video files there's no exact ID3 equivalent — different container formats (MP4, MKV, WebM) store metadata differently. We use a tool called ffprobe (part of ffmpeg) to read whatever the container has and tell us the duration, bitrate, codec, etc.
Composer for the first time in HomeStream
Up until now we've been writing all the code ourselves. But reading ID3 tags is a solved problem — there's a battle-tested PHP library called getID3 that handles every weird edge case (broken tags, multiple ID3 versions, exotic formats). No need to reinvent. We pull it in with Composer.
cd /home/erictey/server/homestream
composer require james-heinrich/getid3
Composer downloads getID3 into vendor/, updates composer.json, and registers its classes in the autoloader. From this point, anywhere we have require __DIR__ . '/../vendor/autoload.php';, we can do new getID3() and it just works.
🐍 Python: composer require ≈ pip install && pip freeze > requirements.txt in one command. vendor/ ≈ a virtualenv. The autoloader ≈ the Python import system finding modules.
Using getID3
The API is delightfully simple. Two lines:
$id3 = new getID3();
$info = $id3->analyze($path);
What comes back is a giant nested array with every piece of info getID3 could find. The bits we care about live at predictable spots:
$title = $info['tags']['id3v2']['title'][0] ?? null;
$artist = $info['tags']['id3v2']['artist'][0] ?? null;
$album = $info['tags']['id3v2']['album'][0] ?? null;
$duration = isset($info['playtime_seconds']) ? (int) $info['playtime_seconds'] : null;
The ?? chains gracefully handle "this tag doesn't exist." Notice we ask for id3v2 specifically; there's also id3v1 (older, less data) — production code might fall back if id3v2 is missing. For HomeStream we'll keep it simple and accept that some old MP3s will have null tags.
ffprobe for video duration
For video files, we shell out to ffprobe — a CLI tool that comes with the ffmpeg package. It prints info about a video file in a parseable format. We ask it for JSON because PHP loves parsing JSON.
$cmd = sprintf(
'ffprobe -v error -show_entries format=duration -of json %s',
escapeshellarg($path)
);
$json = shell_exec($cmd);
$data = json_decode($json, true);
$duration = isset($data['format']['duration']) ? (int) round($data['format']['duration']) : null;
Two critical things to call out here. First: escapeshellarg wraps the path in quotes and escapes any shell-special characters. This is the shell equivalent of prepared statements — never concatenate user input (or even database content) directly into a shell command without escaping. Files with spaces or apostrophes or weird chars in their names will break things otherwise (and at worst, they're a remote command execution vector).
Second: shell_exec runs the command and returns its stdout. Useful but only safe when you're calling a known binary with sanitized args. Some hosting providers disable it; on your own Lubuntu box it's fine.
Install ffmpeg if you haven't already — it provides ffprobe:
sudo apt install ffmpeg -y
ffprobe -version
Should print a version string. ffmpeg also gives us ffmpeg itself which we'll use for thumbnails in the next chapter. Two tools for the price of one apt command.
Build: bin/metadata.php — The Backfill Script
Goal: a CLI script that loops over all media rows where the metadata fields are still NULL, extracts what it can, and updates the row. Run it once after the scanner. Re-run any time you add new files. After it finishes, the library page suddenly looks way better.
- Create
/home/erictey/server/homestream/bin/metadata.php. - Paste this in:
#!/usr/bin/env php <?php declare(strict_types=1); require __DIR__ . '/../vendor/autoload.php'; require __DIR__ . '/../lib/db.php'; $id3 = new getID3(); // Grab everything missing key metadata. $stmt = db()->query(" SELECT id, path, type, title FROM media WHERE duration_s IS NULL ORDER BY id "); $updated = 0; foreach ($stmt as $row) { $path = $row['path']; if (!is_file($path)) { echo " skip (missing): $path\n"; continue; } $title = null; $artist = null; $album = null; $duration = null; if ($row['type'] === 'audio') { $info = $id3->analyze($path); $title = $info['tags']['id3v2']['title'][0] ?? $info['tags']['id3v1']['title'][0] ?? null; $artist = $info['tags']['id3v2']['artist'][0] ?? $info['tags']['id3v1']['artist'][0] ?? null; $album = $info['tags']['id3v2']['album'][0] ?? $info['tags']['id3v1']['album'][0] ?? null; $duration = isset($info['playtime_seconds']) ? (int) round($info['playtime_seconds']) : null; } else { // video — use ffprobe $cmd = sprintf( 'ffprobe -v error -show_entries format=duration -of json %s 2>/dev/null', escapeshellarg($path) ); $json = shell_exec($cmd); $data = json_decode($json ?: '{}', true); if (isset($data['format']['duration'])) { $duration = (int) round((float) $data['format']['duration']); } } // Only overwrite title if we got something real; otherwise keep the // filename-derived placeholder. $update = db()->prepare(" UPDATE media SET title = COALESCE(:title, title), artist = :artist, album = :album, duration_s = :duration WHERE id = :id "); $update->execute([ 'title' => $title, 'artist' => $artist, 'album' => $album, 'duration' => $duration, 'id' => $row['id'], ]); $updated++; $label = $artist ? "$artist — $title" : ($title ?: basename($path)); echo " ✓ $label (" . ($duration ?? '?') . "s)\n"; } echo "\nDone. Updated $updated items.\n"; chmod +x bin/metadata.php- Run it:
bin/metadata.php. You should see each file logged with its newly-extracted info. - Refresh the library page. Notice the durations are now populated, and the titles look way better.
- Click a track. The play page now shows artist and album in the meta line.
Stretch goals:
- Extract more fields — year, genre, track number — and add columns for them.
- For videos, also pull resolution (1080p, 4K) and codec info from ffprobe. Display them in the library.
- Re-extract on demand by passing
--forceto ignore the "WHERE duration_s IS NULL" filter. - Pull album art (it's embedded in the ID3 tag as a binary blob), save it to
storage/art/, and show it in the player.
What you flexed: Composer require for a real library, getID3's analyze() and the nested array structure, ffprobe with JSON output, escapeshellarg for shell safety, COALESCE in SQL to avoid overwriting good data with NULL, and a UPDATE pattern that backfills cleanly. This is one of those chapters where the codebase suddenly feels mature.