Mar 1, 2026 2 min read

On AI as Force Multiplier

Seed note

After building a production PII detection engine with LLMs at CommandK and now working on AI-powered workspace planning at Saltmine, I have some evolving thoughts on where AI fits in engineering work.

The demo-to-production gap

The gap between an LLM demo and a production LLM system is enormous. Demos handle the happy path. Production handles:

What happens when the model hallucinates?
What happens when latency exceeds your budget?
What happens when the API is down?
What happens when the model changes behavior after a provider update?

At CommandK, we ran three detection strategies in parallel: regex patterns, ML classifiers, and LLM-based classification. No single approach was reliable enough alone. The ensemble was.

Force multiplier, not replacement

The best AI integrations I have built amplify human capability rather than replace it:

The PII detection engine suggests findings, but a human reviews and confirms
The workspace recommendation engine generates options, but a planner evaluates and decides
Claude Code proposes architecture, but I review and adapt

The pattern: AI handles the exhaustive search, humans handle the judgment.

What I am still figuring out

How to evaluate LLM outputs systematically when there is no single correct answer
The right balance between cost and quality when you are making thousands of API calls per day
How to build user trust in AI-assisted workflows where the stakes are real
Whether RAG or fine-tuning is the right approach for domain-specific knowledge (leaning RAG for now)

This note is a seed. These thoughts will mature as I build more.

Connected ideas

Budding

What Production Really Means

I learned what 'production' really means when the users were healthcare professionals in hospitals.

Thoughts? Reply via email →