0

AI pentest tools: I tried six in our pipeline. Two stayed.

Posted by topform

Every other vendor pitch I get this year opens the same way. "Our platform uses AI agents to autonomously discover vulnerabilities…" I started keeping a folder. It's full now.

So we ran an experiment at Wattlecorp. We took six AI-driven pentest or vulnerability-discovery tools — a mix of well-funded startups and big-name additions to existing platforms — and put them through real engagements over a quarter. Not benchmarks. Real client work, alongside our human team. The brief to my testers was simple: pretend the AI tool is a junior consultant who joined this week. Use it. Tell me what you actually used it for, and what you stopped using it for.

Two stayed in the workflow. Four didn't.

I'm not naming the four. Some of them will be fine in a year, some won't. Naming them now mostly serves nobody. But here's the pattern.

The four that didn't survive had the same failure mode. They were excellent at the part of pentesting that wasn't actually the bottleneck. Generating payloads, fuzzing inputs, summarising tool output, drafting findings narrative — yes, all faster than a human. But none of that is what slows a pentest down. What slows a pentest down is judgment: which of the 200 things the scanner just flagged is actually exploitable in this specific environment, against this specific business logic, with this specific compensating control already in place. That's the bit nobody had figured out, and the four tools we shelved were essentially highly automated junior testers — productive at the easy parts, useless at the parts that matter, and confidently wrong often enough that a senior had to re-verify everything anyway. Net cost: positive. Net value: zero or negative.

The two that stayed are doing one of two things really well.

The first is scoped reconnaissance and asset discovery. We're a services firm; every engagement starts with mapping what the client actually has. AI tools that ingest a domain, a subnet, a code repo, or a cloud account and surface a clean attack surface map — including things the client themselves had forgotten about — save us hours per engagement. Not because the underlying capability is new (most of this is just orchestration over Amass, Subfinder, Nuclei, and friends), but because the tool gets a junior tester to a useful starting picture in fifteen minutes instead of half a day. We pay for that.

The second is reporting. Not "AI-generated findings" — please, no. But once a senior tester writes the technical body of a finding, the AI tool is genuinely good at producing the executive summary, the impact paragraph the CISO will actually read, and the remediation language that fits the client's documentation style. It's also fast at translating into Arabic for our UAE clients without us paying a translator. We caught two hallucinated severity ratings in the first month and put guardrails around it; since then, it's been a quiet productivity win. Boring use case. Real value.

What I'd watch out for if you're being sold one of these. Three honest questions.

The first is whether the tool is autonomous or assistive. Most useful AI in security work is assistive — it makes a senior faster. The "autonomous" framing exists because that's what raises money, not because it's what works in 2026. Ask the vendor what the human is doing during the autonomous run. If the answer is "supervising," it's assistive. Price it accordingly.

The second is whether they can show you a finding the AI surfaced that a human would have missed. Not "found faster" — missed. In our six-tool experiment, exactly zero of them produced a finding our humans wouldn't have eventually reached. They saved time on the path to known classes of bugs. They didn't expand the bug surface.

The third is what happens when the model is wrong. Vendors love showing the demo where it works. Ask for the loss curve. How often does it produce a confident finding that's not real? In our trial, false-positive rates ranged from "annoying" to "completely unusable." This number gets buried in vendor decks for a reason.

So where does that leave us? Cautiously optimistic, with two AI tools billing line items and four pilot agreements quietly not renewed. The thesis I'm running on is that AI is genuinely changing the speed of pentesting and the floor of what a junior can produce — both real, both economically meaningful — but it has not yet changed what makes pentesting valuable. The valuable part is still a human who's seen this kind of system fail before, sitting with the data, asking what's the worst thing that could happen, and not being satisfied with the obvious answer.

That's the part I'm still happy to pay senior money for. We'll revisit the experiment in Q3.

Zuhair runs Wattlecorp Cybersecurity Labs. We do offensive security across the Gulf and India.


0 Comments

Post a Comment

Copyright © 2009 Topfom Cybersecurity Blog: Navigating Tech Trends & Digital Security Since 2007 All rights reserved. Theme by zuhaircmr. | Bloggerized by topform.

free hit counters