On AI as Force Multiplier
Seed noteAfter building a production PII detection engine with LLMs at CommandK and now working on AI-powered workspace planning at Saltmine, I have some evolving thoughts on where AI fits in engineering work.
The demo-to-production gap
The gap between an LLM demo and a production LLM system is enormous. Demos handle the happy path. Production handles:
- What happens when the model hallucinates?
- What happens when latency exceeds your budget?
- What happens when the API is down?
- What happens when the model changes behavior after a provider update?
At CommandK, we ran three detection strategies in parallel: regex patterns, ML classifiers, and LLM-based classification. No single approach was reliable enough alone. The ensemble was.
Force multiplier, not replacement
The best AI integrations I have built amplify human capability rather than replace it:
- The PII detection engine suggests findings, but a human reviews and confirms
- The workspace recommendation engine generates options, but a planner evaluates and decides
- Claude Code proposes architecture, but I review and adapt
The pattern: AI handles the exhaustive search, humans handle the judgment.
What I am still figuring out
- How to evaluate LLM outputs systematically when there is no single correct answer
- The right balance between cost and quality when you are making thousands of API calls per day
- How to build user trust in AI-assisted workflows where the stakes are real
- Whether RAG or fine-tuning is the right approach for domain-specific knowledge (leaning RAG for now)
This note is a seed. These thoughts will mature as I build more.
Connected ideas
Thoughts? Reply via email →