AI Does Not Ship Code — You Do. Lessons from Building with Copilot and Gemini

There is a growing perception that AI coding tools let you sit back and watch the magic happen. I tried that. It did not go well. This is a post about what actually works when you build a real project with AI — the discipline it requires, the things that broke when I gave it too much freedom, and what I would tell someone starting their first AI-assisted project today.

Over the past few months I built a news aggregation platform — 140+ sources, web scrapers, AI summarization, automated deployment — using GitHub Copilot and Google Gemini as my primary development partners. The previous post covered the technical migration. This post is about the human side. The mistakes, the recovery, and the workflow I wish I had from day one.

#The Temptation: Let AI Write Everything

When you first start working with AI coding tools, the temptation is obvious. Describe what you want, let it generate the code, accept the suggestion, move on. And for isolated functions — a regex pattern, a sorting algorithm, a utility helper — this works great.

The problem starts when you let AI drive the architecture. When you stop reviewing the generated code because the last five suggestions were correct. When you accept a “low risk” change without thinking about what it touches.

I learned this the hard way. Twice.

#Contents

Incident 1 — The posts that vanished
Incident 2 — The sidebar that refused to appear
The workflow that finally worked
What broke the rules
Practical advice for building with AI

#Incident 1 — The Posts That Vanished

In December 2025 I had a GitHub Actions workflow that generated news posts and deployed them. The workflow ran, showed a green checkmark, and I moved on.

Two hours later I checked the site. Every post was gone. Twenty-plus articles, deleted.

What happened was straightforward. The workflow switched branches to deploy, but the git checkout wiped the working directory before the generated posts were saved. Then it force pushed the empty state. Clean green checkmark. Zero content.

I recovered the posts from git history. Fixed the workflow. The next week, it happened again. Same root cause, same destruction, same recovery.

The lesson was not about git commands. The lesson was that I had accepted a workflow generated by AI without truly understanding the sequence of operations. I read the steps, they looked reasonable, and I moved on. The AI did not understand that git checkout gh-pages would destroy the files it just created. And I did not catch it because I trusted the tool to handle the sequencing.

After the second incident I built a mandatory safety protocol — backup to temp before any branch operation, count posts before and after, abort if the count drops. It has prevented every potential deletion since.

But the real fix was changing how I worked with AI. I stopped letting it generate entire workflows. Instead, I described each step, reviewed the git implications, and validated the sequence manually before committing.

In January 2026 I was integrating the Chirpy Jekyll theme. The sidebar navigation — Categories, Tags, Archives, About — was not appearing. Just an empty space where the tabs should be.

I asked Copilot to fix it. It suggested the tab files had wrong front matter. Fixed that. Still broken. It suggested config defaults were overriding the layouts. Fixed that. Still broken. It suggested the workflow timing was off — posts were being captured before the Jekyll build. Fixed that. Still broken.

Three layers of fixes, all of which were technically correct, none of which solved the problem.

The root cause turned out to be the deployment model itself. The workflow was deploying pre-built HTML while keeping a .nojekyll file that told GitHub Pages not to rebuild with Jekyll. The theme needed Jekyll to process its collections. No rebuild meant no collections meant no sidebar.

And buried underneath that was another issue — the index.html file was 1,111 lines of pre-built static HTML with no YAML front matter. Jekyll treats files without front matter as static assets. It served the HTML as-is, no theme, no layout, no styling.

Five layers. Each one a valid fix. But each one insufficient without the others.

What saved me was my own frustration. After the third failed fix I stopped and told the AI: “Go through your past solutions and see if you are repeating similar mistakes.” That forced a deeper analysis that eventually led to the deployment model discovery. And when the first deployment fix itself broke, I caught it because I actually checked the live site instead of trusting the green checkmark.

The total debugging time was eleven and a half hours across two days. If I had checked what files actually existed on the deployment branch at the start — a thirty-second operation — I would have found the root cause in ten minutes.

#The Workflow That Finally Worked

After those two incidents I established a workflow for working with AI that I have used since. It is not complicated but it requires discipline.

#1. I Direct, AI Executes

I decide the architecture, the sequence of operations, and the validation criteria. AI generates the implementation. This sounds obvious but in practice it is easy to blur the line. When Copilot suggests a multi-step workflow, I do not accept it as a unit. I review each step for side effects — what does this git command do to the working directory, what state does the next step assume, what happens if this HTTP request fails.

#2. Mini CI/CD Pipeline

I treat every AI-generated change the way a real CI/CD pipeline would:

Plan — describe what needs to change and why
Implement — let AI generate the code
Review — read every line, check edge cases, validate assumptions
Test — run it locally before committing
Validate — check the actual output, not just the exit code

This mirrors what happens in professional software delivery. The difference is that without AI you write the code yourself and have inherent understanding of it. With AI you have to build that understanding through deliberate review.

#3. Never Trust a Green Checkmark

A passing build means the syntax is valid and the steps completed without errors. It does not mean the output is correct. My posts deletion had a green checkmark. My empty sidebar had a green checkmark. Both were catastrophically wrong.

After every deployment I check the live site. After every data operation I count the records. After every git operation I verify the branch state. This takes thirty seconds and has saved me hours of debugging.

#4. Question “Low Risk”

Every time AI labels a change as “low risk” I now ask myself: what systems does this touch? What are the dependencies? What am I assuming is true that I have not verified?

In complex systems, there is no such thing as a simple change. Removing a .nojekyll file sounds trivial until you realize it is a gate controlling the entire GitHub Pages rebuild pipeline. Changing a workflow step sounds low risk until you realize it affects file availability across git operations.

The correct default is to assume every change is high risk until validated otherwise.

#What Broke the Rules

Beyond the two major incidents, working with AI on a real project surfaced a pattern of smaller issues that accumulated. None of these crashed the site. But they eroded quality in ways that took time to notice and fix.

#AI-Generated Content That Sounded Right but Was Wrong

The Gemini-powered analyst opinions sometimes stated things with confidence that were not supported by the source articles. A headline about a “critical vulnerability in Apache” would become an analyst summary claiming “widespread exploitation observed across enterprise environments” when the original article said no such thing. The model was pattern-matching from training data, not reading the sources.

The fix was constraining the prompt — shorter output, factual tone, explicit instruction to only reference what appeared in the provided articles. But I only caught this because I read the generated summaries against their sources. If I had trusted the output, the site would have been publishing fabricated severity assessments.

#Keyword Expansion That Matched Too Broadly

The synonym expansion system was designed to catch related terms — so “ransomware” would also match “ransom attack” and “crypto extortion.” But the initial implementation was too aggressive. “Breach” would match articles about “breaching whales” in marine biology feeds. “Cloud” would match weather reports.

Word-boundary regex fixed the false positives. But the broader lesson was that AI-generated regex patterns need adversarial testing. I built a validation suite with eight true positives that should match and four false positives that should not. Every change to the keyword system runs against both sets.

#Workflow Steps That Assumed State

AI-generated workflow steps consistently assumed the previous step succeeded and left the environment in a specific state. Branch is checked out. File exists. Directory was created. These assumptions are usually true. When they are not, the failure is silent and cascading.

The fix was adding explicit checks — test -f _config.yml || exit 1 — at every step that depended on a previous step’s output. Defensive programming. Not elegant, but it catches failures where they happen instead of three steps later.

#The “Just One More Change” Trap

The most insidious pattern was scope creep driven by AI’s willingness to keep going. “While we are in this file, let me also fix…” led to commits that touched five unrelated things. When something broke, isolating the cause meant untangling changes that should have been separate commits.

I now enforce atomic commits — one logical change per commit, with a clear description of what changed and why. This makes git bisect actually useful and makes rollbacks possible without losing unrelated work.

#Practical Advice for Building with AI

If you are about to start a web app project using AI coding tools, here is what I would tell you.

#Set Up Your Safety Net First

Before you write a single line of application code:

Initialize git. Commit early, commit often, commit atomically.
Set up a local dev environment. Do not develop directly on production. This sounds obvious but AI tools make it tempting to just push and see what happens.
Create a validation script. Even if it just checks that your build succeeds and your index page renders. Run it after every change.
Document your deployment model. Write down how code gets from your editor to the live site. Every step. Every branch operation. Every file that needs to exist.

#Understand What AI Is Good At

AI excels at:

Boilerplate code — component scaffolding, config files, utility functions
Pattern implementation — “write a retry wrapper with exponential backoff”
Code transformation — “convert this Jekyll frontmatter to Astro format”
Debugging known patterns — “this regex is not matching, fix it”
Documentation — generating comments, READMEs, type definitions

AI is unreliable at:

Architecture decisions — it will give you something that works, not necessarily something that scales or maintains well
Deployment sequencing — it does not understand git branch state transitions
Content accuracy — it generates plausible text, not verified facts
Risk assessment — it labels everything as “low risk” because it cannot model systemic dependencies
Knowing when to stop — it will keep generating code until you tell it to stop

#Review AI Output Like You Review a Junior Developer’s PR

Read every line. Ask why. Check edge cases. Look for assumptions about state. Test the unhappy path. Do not merge just because the tests pass.

This is not about distrusting the tool. It is about recognizing that AI generates code based on patterns, not understanding. It does not know that your git checkout will wipe the working directory. It does not know that your .nojekyll file controls a pipeline gate. You need to bring that context.

#Build Incrementally

Do not describe your entire application and ask AI to build it. Break it into the smallest unit of work that is independently testable:

Build the data model. Test it.
Build the fetch layer. Test it.
Build the processing pipeline. Test it.
Build the output renderer. Test it.
Build the deployment workflow. Test it.
Connect them. Test the integration.

Each step should produce a working, committed, validated checkpoint. If step 4 breaks, you roll back to step 3’s checkpoint and try again. You do not lose everything.

#Keep a Lessons Learned Document

This might be the most valuable thing I did. Every time something broke — and things will break — I documented what happened, why it happened, and what the fix was. Not just the technical fix but the process failure that led to it.

That document became the guardrails for every future change. “Before any git branch operation, backup to temp first” is not a rule I invented from theory. It is a rule I earned by losing twenty posts. Twice.

#The Honest Summary

Building with AI is faster than building without it. My news aggregation engine — 3,200 lines of Python, 10 web scrapers, fuzzy deduplication, AI summarization — would have taken months without Copilot and Gemini. It took weeks.

But faster does not mean easier. The speed creates a new failure mode: you move so fast that you skip the understanding phase. And without understanding, you cannot debug, you cannot validate, and you cannot prevent the next incident.

The workflow that works is simple. You provide the direction. AI provides the implementation. You validate the result. No shortcuts on the validation step. No trusting green checkmarks. No accepting “low risk” without proof.

The tools are genuinely powerful. But power without discipline is just a faster way to break things.

Thats it.