Mar 3, 2026

I use AI tools as a CTO who still ships production code: What that actually looks like

In the era of vibe coding, developers often split into two camps: AI will take my job, or AI is a toy that only outputs garbage.

The reality is that AI amplifies the skills you already have.

Zero skills means zero usable output. But if you know what you are building and why, AI can move you faster than old workflows ever did.

I spent a year as a solo founding CTO building a cybersecurity platform from scratch. I started with no background in security tooling, network scanning, or AI orchestration.

By the end of that year, the platform was running multi-agent security scans for MSPs across real client environments.

I used AI in almost every stage of that process, but I did not prompt my way to a product. I switched between three deliberate operating modes.

Mode 1: AI as research partner

The first thing I needed was not code. It was knowledge.

The platform had to orchestrate tools like OpenVAS, Nmap, and BBot, consume their output, and produce actionable reports.

I did not know their output formats, protocol-level scanning behavior, or what MSPs considered useful findings.

That is a steep learning curve before writing production code.

How I ran research

I ran parallel research sessions across Claude, Gemini, and ChatGPT. Each had a different strength:

ChatGPT was stronger at high-level planning and stack definition.
Gemini handled very large context windows and helped synthesize large input sets.
Claude was strongest as the coding agent implementing from the other two outputs.

The critical step was not reading AI output. Instead, I took snippets from those sessions and dropped them into harness projects.

Harnesses are throwaway repos where I run code, inspect behavior, and iterate until I understand the underlying concepts.

AI was able to explain Nmap detection in words. But, my real understanding came from running scans against a test network and examining raw output myself.

A simple harness might look like this:

#!/usr/bin/env bash
# Run Nmap with different options, capture raw XML, compare outputs.
# Target: Nmap's official test host (safe to scan).
TARGET="${1:-scanme.nmap.org}"
OUTDIR="${2:-./nmap-harness-output}"
mkdir -p "$OUTDIR"

# 1. Basic: ports + service/version detection. What does -sV add to the XML?
nmap -oX "$OUTDIR/01-basic-sV.xml" -sV "$TARGET"

# 2. Default scripts (-sC). Script results show up in XML; need to see the schema.
nmap -oX "$OUTDIR/02-scripts-default.xml" -sC -sV "$TARGET"

# 3. Aggressive: OS detection, version, scripts, traceroute. How big does XML get?
nmap -oX "$OUTDIR/03-aggressive.xml" -A "$TARGET"

# 4. SYN scan. Output structure for open vs filtered differs.
nmap -oX "$OUTDIR/04-syn-scan.xml" -sS -sV "$TARGET"

Then open the files, diff the XML, and observe what your parser must handle.

That run-inspect-adjust loop is what makes the research phase stick.

This phase took about two months. It felt slow while leadership asked for prototypes.

But those two months prevented guesswork later. When I started building for real, I understood scanner outputs, edge cases, and what the orchestration layer had to handle.

Mode 2: AI as development accelerator

Once I understood the domain, AI shifted from teacher to coworker.

The workflow that actually works

My pattern is simple: plan first, then instruct execution.

I do not prompt from a blank file. I run a long planning session with ChatGPT to spec stack choices, architecture, and diagrams.

It searches APIs, docs, interfaces, and real-world examples, then generates a technical design doc with Mermaid and pseudocode.

I hand that doc to Codex or Claude Code for scaffolding, then have a second agent review scaffold output against the design doc.

The scaffolding agent is rarely perfect. The second reviewer catches drift.

Where AI was unexpectedly valuable

AI was most useful when it found options I did not know existed.

In one iteration, I used OpenVAS Greenbone Community Edition. It was a fragile multi-container setup with pinned dependencies.

My architecture spun up EC2 to pull images fresh, but upstream image updates broke internal compatibility.

I asked AI what the likely issue was and how to verify it. It identified version mismatch quickly.

Then it suggested the Imauss all-in-one package, a single-container replacement that removed the dependency choreography.

I would not have found that path as quickly on my own.

Mode 3: AI as architectural sounding board

This is the least discussed mode and, for a solo CTO, often the most valuable. After all, I did not have a staff engineer to spar with.

When solving problems like placing scanners inside hardened client networks, my options were either thinking alone or using AI as a sparring partner.

How I used it

I ran parallel prompts across models asking for several viable methods under specific constraints: firewall behavior, private subnet access, OS requirements, and blocked ICMP.

Then I challenged every answer aggressively until a feasible design emerged.

One useful trick: frame the question so the model has to reason, not just agree.

The framing that prevents bad advice

Treat AI as a sparring partner, not as an oracle.

Give it your state constraints, tradeoffs, and tentative direction. Then ask it to attack your plan.

I avoid prompts like “what architecture should I use?” with no context.

A better version is: “what architecture is most common for this problem, what alternatives exist, what are the tradeoffs, and what evidence supports each option?”

Rule that applies to every mode

AI output is untrusted until verified.

For code, that means running it and testing behavior. For architecture, that means pressure-testing assumptions against constraints. For research, that means validating claims in docs and real experiments.

This is where teams get burned: they trust output too early and pay for it later in debugging and rework.

What I got wrong

Mistake 1: I skipped foundational research

At first, I jumped straight into prompting and assumed I could reach a prototype without domain understanding.

That created “almost working” code that never became shippable.

The model output sounded confident, but I could not tell what was wrong or guide the model to correct it.

Mistake 2: I did not verify aggressively enough

In early iterations, I integrated output and moved on too quickly.

When downstream failures appeared, I could not isolate which layer introduced the bug.

Once I treated every AI-generated unit as untrusted until personally verified, quality improved and debugging time dropped.

When to do something different

AI fits most of the lifecycle, but it has real weak spots.

Full-system diagrams: I still draw architecture diagrams, ERDs, and data-flow visuals by hand. AI diagram output is still too error-prone for large systems. Mermaid remains useful for scoped diagrams in design docs.
Data sensitivity: If data cannot leave controlled environments, I need a local model or a provider with clear sovereignty guarantees. That is usually solvable through enterprise contracts, but it changes the workflow and needs to be addressed upfront.

What’s next

This post is the high-level opener for my AI-assisted development series.

Next, I will draw a hard line between AI-assisted development and vibe coding, with concrete examples of AI-generated changes that looked correct but would have caused production incidents without system-level understanding.

If you want updates when the next post is live, sign up below for alerts.