Day 01 · April 29, 2026

Training AI to be my video editor

You don't need to read this

If you have Claude Code, two commands give you the same plugin I used to plan this series. Tell Claude what your show is about; it does the rest.

$ /plugin marketplace add andrewjiang/andrewislearning

$ /plugin install day-01-series-planner@andrewislearning

"I got outplayed on my own video."

My video on X got nearly a million views. Someone reposted it to Instagram, gave away my demo link, and got more comments than my original got likes. So instead of becoming a better editor, I'm spending 30 days teaching an AI to be one for me.

How to set this up yourself

Ten steps. Each one is a small, single-purpose script that Claude orchestrates. Every tool listed runs on your laptop or hits an API — no proprietary editor, no subscription beyond Claude Code itself.

Step 0

Plan the series with Claude before you record anything

The biggest mistake people make starting a daily series is recording first and figuring out the arc later. The arc is the hard part. So I opened Claude Code and asked it to design the whole 30-day series upfront — the through-line, what each day delivers, who the audience is, what makes a viewer come back tomorrow. That conversation became series/SERIES_PLAN.md, a single source of truth the rest of the pipeline reads from.

The plugin above packages that planning prompt. Install it, tell Claude what your show is about, and you'll get the same kind of plan tailored to your topic. Edit it freely — it's a markdown file, not a sealed document.

Tools: Claude Code · the day-01-series-planner plugin

Step 1

Record on your phone — the only manual step

Phone in selfie mode, talk for 60–90 seconds. Shoot b-roll separately: your screen, your hands, anything visual that supports the talk track. Save everything into raw/day_NN/. No tripod, no teleprompter — the AI handles the structure later, and the looser the take, the more natural the final cut feels.

Tools: your phone

Step 2

Transcribe locally with faster-whisper

Whisper turns audio into text with word-level timestamps. Word-level matters because every later step — cuts, captions, b-roll sync — keys off when each word was spoken, not just which sentence it was in.

Run it locally rather than via API. No upload, no key, ~30 seconds for a 90-second clip on an M-series Mac.

$ python pipeline/transcribe.py raw/day_01/selfie_part_a.MOV
# → transcripts/day_01/selfie_part_a.json
#    { words: [{word, start, end}, ...] }

Tools: faster-whisper (Python, runs locally on CPU/GPU)

Step 3

Tighten with a JSON config — Claude proposes the cuts

Cuts are described in JSON, not edited on a timeline. Claude reads the transcript, proposes cuts (dead air, "uhh"s, false starts, retakes), and writes them as keep-ranges. ffmpeg executes.

{
  "input": "raw/day_01/selfie_part_a.MOV",
  "keep": [
    [0.5, 4.2],
    [5.1, 12.8],
    [13.4, 21.0]
  ]
}

Editing this way means revisions are diffs, not click-and-drag. If a cut feels wrong, you change a number. If you want a different take of a phrase, you swap the range. Claude handles the iteration — you watch and react.

Tools: Claude (proposes the cuts) · ffmpeg (executes them)

Step 4

Assemble: selfie audio is the spine, b-roll swaps the visual

The selfie audio plays as one continuous track. B-roll clips swap the visual at scripted moments — the audio underneath never breaks. This is what makes short-form video feel produced rather than lecture-y: the rhythm of your voice drives the visual cadence.

The script is annotated with cues that Claude inserts based on whatever raw clips exist:

[Selfie] Day 1 of training AI to be my video editor.
[B-roll: the IG repost] Someone reposted it to Instagram...
[Selfie] I got outplayed on my own work.

Each [B-roll: ...] marker is a window where the visual cuts away while the selfie audio keeps playing. ffmpeg's overlay filter does the heavy lifting; Claude picks the timing.

Tools: Claude (annotates the script) · ffmpeg (concat + overlay)

Step 5

Caption with ASS subtitles, burned into the pixels

Captions are baked into the video itself, not added as a platform overlay. That way they look identical on Instagram, TikTok, and any embed — and you get full control over font, color, and timing.

The format is ASS (Advanced SubStation Alpha), the only common subtitle format that supports per-word styling. The pipeline reads word timestamps from the transcript and groups them into 1–3 word chunks. Multi-word phrases like AI agent, Claude plugin, and 30 days stay together as a single chunk so they don't get split across reads. Keywords get emphasis — a salmon color (#FFA395) and ~15% larger size — and the result is an .ass file that ffmpeg burns into the video.

Two details that took iterations to get right. First, emphasis color: red and yellow both clashed against warm skin tones, so the keyword color landed on a soft salmon — pink-leaning, on-brand, and easy on the eye for larger text. Second, text replacements (DAY ONE → DAY 1, RECORDED → RECORDING) catch words Whisper transcribes one way that need to display another — useful when your CTA keyword has to match exactly for the comment-funnel to fire.

Tools: custom Python (pipeline/caption.py) · ffmpeg subtitles filter

Step 6

Polish audio and color in one ffmpeg pass

Audio chain — four filters in series:

highpass=f=80         # remove low-end rumble
afftdn                # spectral denoise
acompressor=...       # even out levels
loudnorm=I=-14:LRA=11 # IG/TikTok loudness target

Color: small contrast bump (1.04), light saturation (1.07), neutral-warm temperature (~6300K). Anything more and the video starts looking filtered, which on a personal-vlog format reads as inauthentic.

Speed: 1.08× at the very end. Captions are baked in before the speedup, so they speed up naturally with the audio — no resync needed and no timing drift.

Tools: ffmpeg (loudnorm, eq, colortemperature, atempo)

Step 7

Publish to IG and TikTok with one API call

I use Post for Me — one POST request and the video is queued for both Instagram Reels and TikTok at the same time. Claude writes the caption from the script, picks hashtags from a pre-approved list, and the API handles the upload.

The API key lives in macOS Keychain, never in a .env file that could get committed:

api_key = subprocess.check_output([
    "security", "find-generic-password",
    "-s", "POSTFORME_API_KEY", "-w"
]).decode().strip()

The publish config is per-day JSON — caption, hashtags, platforms, the path to the final mp4. Same shape every day, only the content changes.

Tools: Post for Me API · macOS Keychain · Claude (writes the caption)

Step 8

Funnel comments to DMs with ManyChat

The CTA at the end of every video asks viewers to comment a keyword (for Day 1, that's DAY 1). ManyChat watches the post, catches the keyword, and DMs the link to this guide page automatically. No manual DMing — and the comments themselves become social proof on the post, which boosts reach.

Tools: ManyChat (Instagram automation)

Step 9

Ship the guide page on Vercel

This page is a single HTML file in a public GitHub repo. Vercel watches the repo; every push to main redeploys in about ten seconds. No build step, no framework, no JavaScript framework tax — just CSS variables and one tiny script for the copy button.

If you're starting from scratch: connect Vercel to your GitHub repo, set web/ as the output directory, and you're done. The whole config is six lines of vercel.json.

Tools: GitHub · Vercel (static hosting)

The script

About 50 seconds, 142 words. Same structure the plugin will help you build for your own series — hook, payoff, challenge, CTA.

Selfie

Day 1 of training AI to be my video editor.

My video on X got nearly a million views.

B-roll: the IG repost

Someone reposted it to Instagram, gave away my demo link, and got nearly two thousand comments.

Selfie

I got outplayed on my own work. But it got me to think — I can do that. I'm not gonna become an editor. I can teach an AI to be one for me.

So here's the challenge: 30 days. Every edit through an AI agent. Every post through an AI agent. Just my phone and a laptop.

B-roll: Claude planning the series

Step one wasn't even recording. It was opening Claude. I asked it to plan the whole 30 days, beginning to end.

B-roll: writing / editing / publishing

Write. Edit. Publish. Every day.

Selfie

I'm packaging everything into a free guide and a Claude plugin. Follow and comment 'DAY 1' for the link.

The source

Browse, fork, or run any of it locally:

github repo plugin source all pipeline scripts the 30-day plan

Tomorrow — Day 02

The pipeline starts ingesting raw phone footage. Follow @andrewislearning for the next 29 days.