AI and engineering leadership

AI Did Not Replace Management. It Exposed Why Good Management Matters.

On why the organizations that never quite understood how to lead people are now failing to understand AI in exactly the same way — and for the same reasons.

Over the last eight years, I have led engineering across startups, scale-ups, and regulated enterprises in several countries. That does not make me an oracle. It does mean I have watched the same few management patterns play out enough times to say something about them.

Lately, one of those patterns has become hard to ignore. The companies that never quite understood how to lead people are, with remarkable consistency, now failing to understand AI in exactly the same way — and for the same reasons. The language has changed. The failure has not.

AI did not remove the need for management. It removed some of the excuses.

The old mistake

Long before AI turned into a board-level preoccupation, many of these organizations were already running on weak people systems. They treated people as cost rather than as leverage. They substituted pressure for clarity. They called urgency a strategy. They hired without onboarding, assigned without defining roles, planned in quarters they did not understand, and then described the burnout that followed as “startup pace” or “enterprise reality” or, more honestly, “just how things are here.”

The shape of the failure was not identical across stages, but the root was. In startups, it usually looked like chaos mislabeled as speed — heroics, permanent context debt, ownership vague enough to be deniable. In scale-ups, it looked like process lag: more teams and more managers running on the same shallow coordination model that had worked when the company was fifteen people. In enterprises, it looked cleaner, which is to say it looked like volume — more meetings, more approvals, more artifacts, more governance. Structure is not clarity, but it photographs well.

What these shapes have in common is that human adaptability hid them for longer than any of them deserved. People compensate. Teams cover. A strong individual contributor can carry a weak manager a remarkably long distance before anyone notices whose work is being done.

The new mistake

The same organizations are now building AI initiatives out of the same raw materials. The vocabulary has changed. The management has not.

“Just use AI” is not a strategy; it is the managerial equivalent of hoping the problem will solve itself. If the workflow is not redesigned, if the model’s role is not defined, if no one has decided where its output is a draft and where it is a decision, if there is no evaluation framework and no escalation path and no distinction between an assistant, an agent, a reviewer, and an approver — then nothing has been adopted except a new source of noise poured into an already noisy system.

The engineering literature already reflects this. Anthropic’s team recently wrote about redesigning a technical evaluation because improvements in model capability had gradually collapsed the assessment’s ability to discriminate between candidates; the problem was not the model, it was that the surrounding system had been built on assumptions the model had quietly outgrown. Their domain was hiring, but the principle generalizes: when the capability curve moves, the weakest assumptions in the system around it break first.

Amazon’s 2026 writeup on evaluating agentic systems points the same way from the opposite angle. Their guidance is not “build more agents.” It is that agents inside real workflows require explicit evaluation, component-level measurement, grounded task criteria, and human review — none of which are optional if the output is supposed to be trusted.

In other words: when AI actually enters production work, management design stops being a nice-to-have. It becomes the load-bearing structure of whatever comes out the other end.

What both humans and AI need from a system

I want to be careful here, because the lazy version of this argument collapses humans and AI into the same category, which is wrong and mildly insulting. People and statistical systems are not the same kind of agent. They need different things. They deserve different consideration.

But they share one important property: both perform badly inside a weak operating system.

Both degrade when goals are vague. Both degrade under contradictory instructions, thin context, undefined role boundaries, missing feedback loops, weak evaluation, and poor integration into the workflows that actually matter. In that specific sense — and only that sense — an AI inside a messy company and a new engineer inside a messy company end up producing similar kinds of output: fragmented, confident, partially wrong, hard to verify, expensive to trust.

This is why the phrase “context engineering” keeps drifting into AI conversations. The underlying activity is not new. It is management. Somebody has to decide what the model needs to know, what its job actually is, where its output goes, and who reads it with enough context to catch the errors. If no one does that work, the system fails. If one person does it well, a surprising amount of machinery starts to function.

The field evidence supports this. A study of more than five thousand customer-support agents by Brynjolfsson, Li, and Raymond found that access to a generative AI assistant raised productivity by roughly fourteen percent on average, with the largest gains concentrated among novice and lower-skilled workers. That is not the shape of replacement. It is the shape of amplification inside a workflow — and more specifically, of a workflow that had enough structure for the amplification to land.

Research on human-AI decision teaming finds a similar pattern. Hybrid teams can outperform either humans alone or AI alone, but only when the collaboration itself is designed — when trust is calibrated, when the division of labor is deliberate, when the interface between the two is something someone thought about. When those conditions are missing, hybrid teams frequently perform worse than either mode independently.

None of that should be surprising to anyone who has run a team. Performance is a property of the system, not of the substrate.

Where the analogy stops

Everything I have said so far could, in a bad reading, sound like I am reducing people to throughput. I am not.

People need what models do not: trust, fairness, dignity, recognition, the sense that their work is seen and that their growth is being invested in, the psychological safety to disagree without being punished for it, and something resembling meaning. Those are not productivity inputs. They are the conditions under which humans remain themselves at work.

MIT Sloan’s recent writeup on the EPOCH framework argues that AI is more likely to complement than replace workers, and names the categories of capability that remain specifically human: empathy, presence, opinion, creativity, hope. I do not agree with every taxonomy in that space, but the direction is right. People are not an execution layer, and the practice of leadership has always been about more than task assignment.

If anything, that strengthens the rest of the argument. If good management was already deeper than issuing instructions, then the leaders who understood that depth are also the ones now best positioned to use AI well. The operating disciplines that make humans thrive — clarity, context, review, feedback — turn out to be the same disciplines that make AI useful. That is not a coincidence. It is what competent management has always looked like, applied to new material.

Why weak managers could hide with people

Here is the uncomfortable part, and in my opinion the main reason so many AI initiatives are quietly disappointing the people who funded them.

Weak people management could, historically, produce the appearance of motion. It did so through pressure, through status, through dependency, through manufactured urgency, through the quiet manipulation of who owes whom a favor. None of that was ever good management. But humans are adaptive creatures, and adaptive creatures tend to find a way to keep things moving even in systems that have no right to still be functioning. Strong individuals carried weak managers. Teams covered gaps. People stayed late.

AI does none of that.

It does not respond to pressure. It is not intimidated by seniority. It does not care who the loudest voice in the channel is. It has no ambition, no dependency, no sense of debt. If the context is thin, the role is vague, the workflow is broken, and the feedback loop is missing, the output degrades — not in six months as a performance-review problem, but immediately, in the first draft in the thread.

This is why “the AI isn’t working” is so often a story about the operating model, not the model weights. The managers who relied on human adaptability to paper over weak design are now working with a counterparty that has no adaptability to offer them. The weakness that used to be covered is now the first thing anyone sees.

I will put the point more plainly than is polite: bad management with people sometimes produced motion. Bad management with AI mostly produces scalable noise.

What I learned in my own practice

I have watched this in my own work, which is why I am willing to make the claim. My early serious use of AI looked a lot like weak delegation. The asks were broad. The context was thin. The expectations were unstated, the constraints implicit. Review happened too late, or not at all, and when the output was wrong I usually blamed the model before I looked at what I had actually asked it to do. The results were mixed, as they should have been. I was running the interaction the way a busy, unfocused manager runs a first week with a new report: by handing things over and hoping clarity would emerge on its own.

The turning point was not a prompt. It was not a new model. It was the moment I started treating the interaction as a real operating relationship instead of an oracle query. That meant decomposing tasks into pieces I could actually check. Writing down, in plain words, the role I wanted the model to play and the boundaries of that role. Defining what a good answer would look like before asking for the answer. Iterating instead of batching, and reviewing instead of accepting.

The output changed, and it changed materially. Not because the model had secretly become smarter, but because the conditions under which it was working had become ones in which any competent executor — human or otherwise — could do a better job. The lesson was the one I had already learned about humans, pushed into new territory: the work improved when the management improved.

What a serious operating model looks like

Let me be concrete, because “better management” will otherwise drift into a platitude. In a well-run human-and-AI system, humans should usually own the parts of the work that require judgment, that sit in genuine ambiguity, that carry accountability, that involve the repair of relationships, that trade off real constraints against each other, and that end in a decision someone has to stand behind.

AI should usually accelerate the parts of the work that are bounded: structured analysis, drafting, transformation, synthesis, repetitive support, execution inside well-defined limits. That is not a diminished role. Most of that work currently consumes far more of the average team’s hours than the work that actually requires judgment.

The job of leadership, under those conditions, is to design the operating model between the two. Roles. Interfaces. Escalation. Evaluation. Review loops. Which parts of the workflow should be redesigned because AI makes them cheap, and which should be protected because AI makes them risky. Where the model’s output becomes input to a human decision, and where it becomes the decision itself, and why. None of these are exotic questions. They are the questions any competent manager has always had to answer about work, but only applied to a new kind of worker.

Most “AI strategy” conversations I still see are operating one level above this, at the level of tool access and procurement. That is the easy part. The harder part is the design — which is also the part that determines whether anything useful ends up happening.

What AI is actually telling us

The most important thing AI is revealing is not whether jobs will disappear. That question is more tractable than it looks, and the answer is mostly: some will, most will change shape, and the rate at which either happens will depend enormously on who is deciding.

The deeper question is whether organizations know how to run high-performance systems at all.

Some do. They built companies in which people have context, ownership, feedback, and a real sense of what their work is for. Those organizations are disproportionately well positioned for the AI era, because the disciplines that made them work for humans are precisely the disciplines that make AI useful. They did not need to invent a new management style. They already had one.

Others did not. They ran on pressure, ambiguity, heroics, and the quiet assumption that human adaptability would fill in the gaps. Those organizations are now discovering that AI does not fix weak management — it amplifies its consequences, with fewer excuses and less forgiveness. The faults that people quietly carried for years are surfacing as inconsistent outputs, failed pilots, and projects that looked promising in the slide deck and confusing in production.

A company that failed with people will, as a rule, fail with AI too. Not because AI is human. Because bad management stays bad management, regardless of what sits on the other end of the work.

AI did not replace management. It exposed why it mattered in the first place.

Footnotes

  1. Anthropic Engineering, Designing AI-resistant technical evaluations. https://www.anthropic.com/engineering/AI-resistant-technical-evaluations

  2. AWS, Evaluating AI agents: Real-world lessons from building agentic systems at Amazon. https://aws.amazon.com/blogs/machine-learning/evaluating-ai-agents-real-world-lessons-from-building-agentic-systems-at-amazon/

  3. Brynjolfsson, Li, Raymond, Generative AI at Work, NBER Working Paper 31161. https://www.nber.org/papers/w31161

  4. Toward a science of human–AI teaming for decision making: A complementarity framework, PNAS Nexus. https://academic.oup.com/pnasnexus/article/doi/10.1093/pnasnexus/pgag030/8490283

  5. MIT Sloan, New MIT Sloan research suggests that AI is more likely to complement, not replace, human workers. https://mitsloan.mit.edu/press/new-mit-sloan-research-suggests-ai-more-likely-to-complement-not-replace-human-workers

Tags
ai-adoptionengineering-leadershipmanagementoperating-model

Related projects

Approval infrastructure for AI actions

Approva

Human approval for risky AI agent actions — with passkey identity, scoped capabilities, and a verifiable audit trail.

View projectWebsiteGitHub
approvalshuman-in-the-loop
Notes by email

The weekly read on signals shaping AI, engineering, and regulated systems — once a week, in your inbox.

One email a week. No spam. One-click unsubscribe.