Two years ago I started building AI agents for marketing teams. Back then it felt experimental. Today it's a core part of how I work with clients — and I've built and retired enough agents to know which ones actually earn their keep and which ones don't.
This is that list. The agents I keep running, the ones I've stopped using, and why the difference almost never comes down to the model underneath.
Why most agent stacks stop working.
Here's the pattern I see when I walk into a company that's been trying to do this themselves: fifteen half-built workflows, three abandoned dashboards, and a general feeling that "AI is supposed to be saving us time but somehow everyone's busier."
The problem isn't the models. The models are fine. The problem is stack discipline — or more accurately, the lack of it. Companies build agents because they can, not because they should. No review loop. No retirement criteria. No one accountable for whether the agents are actually producing output the team uses.
The question isn't which agents should we build. It's which agents will still be earning their keep six months from now.
Most of my job, honestly, is killing agents. Clients come to me with an "AI automation" setup that has ballooned, and we spend the first few weeks deciding what to switch off. The answer almost always: more than half of it.
The agents I keep running.
Across most of my client engagements, there's a core set of four to six agents that survive. Not the same ones every time — each business has its own needs — but the shape is consistent. These are the ones I recommend by default.
Research agents.
Specifically: research agents that feed the content team. The job isn't "find me information" (too vague, too broad) but "produce a structured brief on this specific topic, drawing on these specific source types, in this format, by Monday morning."
Good research agents replace the 2–4 hours a content marketer spends gathering material before they can write. They don't replace the writing. That's the important distinction.
Competitive intel agents.
These are the agents clients love once they've experienced them. A competitive-intel agent watches specific competitors — their blog, pricing page, careers page, product changelog — and flags when something meaningful changes. Not "they published something," but "they published something that matters to us, and here's why."
The filter is what makes these work. A naive "send me everything" setup produces noise. A properly tuned agent sends maybe two alerts a week, each one genuinely actionable.
Reporting agents.
The weekly-update agent is the one every marketing team wishes they'd had five years ago. It pulls from analytics, Search Console, CRM, and wherever else the data lives, then writes a narrative summary — not a dashboard — of what moved, what didn't, and what might matter for next week.
Tuned well, a reporting agent replaces 3–4 hours of someone's Friday afternoon. Tuned poorly, it's a spam generator.
Brief generators.
These sit next to research agents but do something different: they produce actual content briefs, ready for the content engine to run on. Topic, angle, structure, keywords, internal links, tone notes. The brief generator is the agent that ties strategy to production — and it's the one that breaks most often when strategy shifts.
Across most engagements, I end up running 3–6 agents per client. Not twenty. Not fifty. The marginal value of adding a seventh agent is almost always lower than the cost of maintaining it. That's not a limitation — it's a design principle.
The agents I've stopped using.
And now the interesting half. These are agents that sounded good, got built, and got switched off — either by me or by the client after six months of honest data.
- "Social listening" agents. In theory, monitor mentions and surface brand sentiment. In practice, the signal-to-noise ratio never earned its keep. I've yet to see a social-listening agent produce an insight that actually changed a marketing decision.
- Auto-publishing agents. The idea: have the AI draft content and push it live with minimal review. The reality: the liability of one bad piece going out unsupervised outweighs the time savings. I now always insert a human approval step. Every time.
- "Insight" summarizers. These chew through data and produce monthly "insights." The problem is that the insights are almost always obvious, vague, or both. A good marketer would have spotted the same thing faster.
- Email drafting agents that tried to be clever. First-gen versions sent "personalized" outreach that was obviously AI-written and damaged brands. I still use email agents — but only for researched, high-effort drafts that are then edited by a human. The "write me 500 cold emails" use case is over.
- Generic "marketing assistant" agents. The dream of one agent that does everything. Always worse than three focused agents doing specific things. This is the single most common failure mode I see.
- SEO audit agents that ran on a schedule. Sounds good. In practice they produced 400-page PDFs nobody read. I now use audit agents on-demand — when there's a specific question to answer — and that's it.
If you can't explain what an agent produces, who reads it, and what decision it triggers — it's not an agent. It's just a process running.
The discipline that makes the difference.
After the specific agents, here's the part that actually separates stacks that work from stacks that don't: a retirement ritual.
Every three months, I sit down with a client and we look at each running agent against three questions:
- Did this agent produce output someone actually used this quarter?
- If we switched it off today, would anyone notice within two weeks?
- Has the problem it was built to solve changed enough that it needs re-scoping?
Any agent that fails all three is a candidate for retirement. No sentiment. No "but we just built it." Agents that don't earn their keep get switched off, full stop.
That's the thing most AI automation setups lack. Nobody's paid to switch things off. Everyone's paid to build more. So stacks grow, and most of what grows is dead weight.
What I'd tell you if you're starting now.
Pick three agents. Not ten. Three.
Pick them based on actual bottlenecks your team has — not the ones you read about in a tech blog. Research, reporting, competitive intel is a defensible starting three for most B2B marketing teams, and I'd bet on that combination over almost any other.
Build them with a clear "what does the output look like and who reads it" answer before you start. If you can't answer that, don't build the agent.
And schedule the retirement check-in before you schedule anything else. Three months from now, you'll want someone to tell you honestly what's working. That someone is usually me, but it doesn't have to be. It just has to be someone.
The stack that matters isn't the one that does the most. It's the one that still works six months from now.