AI in software development: strategic decisions, and a few uncomfortable questions
"3x productivity with AI." Productivity measured as what? There is a huge difference between AI working faster and your company delivering more value. Questions that never make it into the 10-minute demo.
The new story in software engineering sounds pretty straightforward: stop writing code, start writing specs; AI handles the rest, and we simply “supervise.”
It sounds efficient, modern, and looks great on a dark-background slide. But before redesigning how a team works around that idea, it is worth pausing and asking a few questions that never show up in a 10-minute demo.
The new bottleneck is called “specification”
The pitch is simple: if we write good specs, AI will generate good code. Problem solved.
But there is one small detail: not everyone can write a truly good specification. You need someone who understands the business domain, knows the architecture, remembers where the hidden skeletons are, and has lived through enough production incidents to anticipate edge cases. In other words, the same people who could already solve the problem end to end without AI.
So it is worth asking: if I need my most expensive people writing ultra-detailed specs…
- At what point is it faster for those same people to write the critical code directly?
- If AI needs very detailed instructions to do the work, is that really different from how we already work with junior developers?
- How much of this is true efficiency, and what ROI would these agents actually deliver compared with giving those same detailed specs to a junior engineer?
If the answer is “we are not sure yet, but we have a chart that says 3x productivity,” then this may not be a data-driven decision. It may just be wishful thinking.
Productivity: are we measuring delivered value or just produced code?
A lot of AI success stories follow the same script: an agent generates an entire feature in minutes and everyone applauds.
But life in production includes a few more steps:
- Understanding the problem (often the hardest part)
- Choosing the solution (weighing trade-offs and preparing for future change)
- Implementing, reviewing, testing, integrating, deploying
- Operating and maintaining the system (also known as fixing things when they fail at 3 AM)
If we aggressively speed up implementation while leaving everything else untouched, the side effects are predictable: more changes per week, more code to review, and more subtle errors that do not show up on the happy path.
Result: the bottleneck moves to code review, QA, and maintenance. The team feels “more productive” because more happens in the repo, while the business still waits for the same release that never quite stabilizes.
That leaves us with the questions that do not sound great in marketing decks, but matter before buying the story:
- Does real lead time from “we had the idea” to “the customer is using it” actually go down, or only the time spent writing code?
- What happens to bug fixing, rework, and incident handling after introducing agents?
- Are we measuring company-level productivity, or just celebrating a higher PR count?
If the star metric is “lines of code generated by AI,” we already know who is really making money, and it is not necessarily the company.
Do the studies look like our work, or like a lab setup?
The “+50% productivity” numbers usually come from highly controlled environments: narrow tasks, clear requirements, clean code, no internal policy constraints, no legacy, no undocumented “do not touch that.”
Meanwhile, many teams live in a different reality: systems with ten years of history and zero years of documentation; multiple teams and vendors that do not collaborate as closely as they should; business rules split across code, slides, and the memory of two people.
Basic question, often skipped:
Does the type of work measured in those studies resemble what my organization does… or just the conference demo we watched?
If the answer is “not really, but the number looks good in strategy slides,” then we are buying narrative, not evidence.
Internal capability: the asset that erodes quietly
When AI writes most of the code most of the time, a few quiet things start to happen:
- Developers stop practicing the skill of navigating and changing complex systems without crutches.
- The default reflex for any change becomes “ask the agent.”
- More and more parts of the system exist that nobody on the team truly wrote end to end.
From a business perspective, that translates into:
- Lower resilience: if the tool changes pricing, terms, or simply loses competitiveness, internal capability is no longer where it used to be.
- Less visible technical debt: it works today, but nobody wants to be the person touching it tomorrow.
- Lock-in not just to a vendor, but to a way of working that is expensive to reverse.
Question worth asking with a two- to three-year horizon, not just next quarter:
If tomorrow I had to cut agent usage drastically due to cost, regulation, or strategy, could my team still deliver with reasonable speed and quality… or would they have to relearn core engineering under pressure?
If that picture feels unsettling, the risk already exists, even if the dashboard is all green.
Incentives: who needs this to be a revolution at all costs?
You do not need a conspiracy when incentives are this aligned:
- Model providers need “radical transformation” stories to justify funding rounds and valuations.
- AI tooling vendors need process reorganization around their product to look inevitable, not optional.
- Consultancies do better when the project is called “transformation” rather than “incremental improvement.”
And inside companies, personal careers get tied to “leading AI adoption,” so few people want to be the one in the meeting saying “let’s wait for data” while everyone else talks about “not falling behind.”
A healthy filter before buying any plan is to mentally reframe it:
If the person proposing this had zero financial or reputational exposure to AI being “revolutionary,” would they recommend the same thing with the same intensity?
If the answer is “probably not,” then the conflict of interest is already there, even if nobody writes it on the slide.
So where does it make sense to bet big today?
None of this means AI is useless. It means not everything we can do with AI has the same risk-reward profile.
The best cost-benefit cases are usually:
- Accelerating developers on repetitive tasks: tests, mechanical refactors, documentation, examples.
- Doing work that never made it into the backlog because the cost was too high: internal scripts, small automation, incremental cleanup.
- Exploring more design options before committing to one, without multiplying manual effort.
In those cases, if things go wrong, impact is limited. If things go well, upside is clear.
By contrast, reorganizing your entire operating model around assumptions like:
- “AI will write most of the code”
- “developers will become specification producers”
- “and supervision will be enough to preserve quality”
is a high-impact bet with still-limited evidence at real-world scale.
The core question is not whether AI “is the future”; marketing settled that one long ago.
The question that actually helps decision-making is this:
Are we integrating AI into software development in a way that improves returns across the whole system, without degrading key team capabilities or creating a dependency that is hard to unwind?
If the answer is yes, go ahead and use every useful tool available.
If the answer is “I am not sure, but the market seems to be moving,” the next step is not to restructure the team. The next step is to design small experiments and keep enthusiasm behind the numbers, not ahead of them.