About
Subscribe

ANALYSIS: Big Tech sets AI to catch AI

Nicola Mawson
By Nicola Mawson, Contributing journalist
Johannesburg, 21 Apr 2026
Companies are treating advanced AI as critical infrastructure rather than just software. (Graphic: Nicola Mawson | Freepik.)
Companies are treating advanced AI as critical infrastructure rather than just software. (Graphic: Nicola Mawson | Freepik.)

As hackers use () to bypasses – including one of the largest government breaches on record – companies like Anthropic and OpenAI are holding back tools they see as now being defence systems.

Cyber security firm Gambit recently said it had analysed the attack path behind what it says is one of the largest government breaches on record: the compromise of Mexico’s tax authority and at least eight other organisations.

KNOW MORE

Cyber security professionals can join hundreds of industry peers at ITWeb Security Summit Cape Town 2026 and ITWeb Security Summit 2026 in Johannesburg, where expert speakers will explore how organisations can stay resilient in the face of AI-driven attacks and an increasingly complex threat landscape.

Within a month, nine institutions were affected, exposing 195 million identities and tax records, as well as 15.5 million vehicle registry records, including licence plates, names, taxpayer IDs and addresses, says Gambit.

The use of AI in cyber attacks isn’t new, as hackers started using the tool before 2020 for spam evasion and basic automation, but it only became mainstream in 2024, when code generation and attack tooling were treated as established risks.

By this year, attackers were using AI to scale and accelerate cyber crime, which extends from generating code and automating attacks, to crafting convincing phishing and deepfake scams. The AI Incident Database lists more than 7 000 incidents in which AI was used as a hacking tool.

195 million reasons to worry

In Mexico, Gambit found that the attackers also extracted 295 civil records of births, deaths and marriages, almost six million property owner records, an additional 2.28 million property records, and other sensitive data.

The operation used more than 1 000 AI prompts, passing information to a second AI platform for analysis, says Gambit, noting that guardrails were bypassed within about 40 minutes.

“The attacker was not a nation state. This was a small group of individuals directing AI as an operational team that found and exploited vulnerabilities, built exfiltration tools, bypassed defences, elevated privileges, established back doors, and even analysed data along the way to help move laterally to gain administrative control of more systems and to exfiltrate more data,” says Gambit.

Turning AI on its head

Gambit’s report is overshadowed by recent news that Anthropic’s Mythos tool uncovered a decades-old flaw in OpenBSD. It didn’t just find weaknesses in Mozilla Firefox’s JavaScript engine, it repeatedly turned them into working attacks, proving they were genuinely exploitable.

An article in The Conversation, cited under a Creative Commons licence, says it is “significant” that Anthropic claims Mythos has uncovered software vulnerabilities and bugs “in every major operating system and every major web browser”. Mythos “excels at completing complex, multi-step cyber security tasks,” according to Pluralsight.

Jacqui Muller, Belgium Campus iTversity researcher and PhD candidate in computer science, says Mythos is not “some unstoppable hacker, but AI has clearly crossed a threshold where it can systematically find and potentially exploit software weaknesses faster than humans”.

An article on the World Economic Forum’s blog yesterday concurs. “Frontier AI systems are becoming more autonomous and powerful, but also harder to control once deployed.”

Cyber security failures have real-world consequences, professor Stan Karanasios and associate professor Saeed Akhlaghpour, both at the University of Queensland, write for The Conversation.

Colour classifications of incidences in AI-driven cyber attacks. (Source: AI Incident Database)
Colour classifications of incidences in AI-driven cyber attacks. (Source: AI Incident Database)

“In Australia, the Optus breach exposed the personal information of about 9.5 million people. In another case, stolen Medibank records included sensitive health information, and some of the data was later released on the dark web. These incidents were not just database problems. They became crises of privacy, identity and trust,” they state.

Set a thief to catch a thief

AI companies are responding by holding back models that could be used in attacks, instead deploying them to build defence systems that detect vulnerabilities, flag phishing and scam activity, and identify abnormal behaviour across networks in real-time.

In early April, Anthropic said it would not release Mythos publicly because of its capabilities and associated risks, alongside disclosing its offensive potential. Instead, it launched Project Glasswing to turn Mythos into a cyber defence tool.

The initiative brings together companies including Microsoft, Amazon, Google, Apple, Cisco and NVIDIA, as well as the Linux Foundation and JPMorganChase.

Muller says initiatives like Project Glasswing aim to prioritise defensive use before such capabilities become widespread. “The more important takeaway is strategic; leading AI labs are now deliberately restricting access to their most powerful models because of dual-use risk.”

Inside track

Karanasios and Akhlaghpour say “the idea is to give defenders a head start to find and fix weaknesses in critical software before similar AI capabilities become widely available to attackers”.

The authors note this is not the first time an AI firm has withheld a model. “In 2019, years before the ChatGPT era, OpenAI did something similar with its (now quite primitive-looking) GPT-2 model.”

ITWeb asked ChatGPT whether OpenAI was holding back models specifically to fight cyber crime. It said the company does not maintain a separate, secret crime-fighting model.

How AI is being used to fight AI-driven crime. (Source: Created with GenAI)
How AI is being used to fight AI-driven crime. (Source: Created with GenAI)

Yet, “there are limits on what gets released. Some capabilities are deliberately not exposed publicly. For example, things that could be easily abused for hacking, malware generation, or social engineering at scale. That’s a safety boundary, not a hidden parallel AI arms race,” says ChatGPT.

The GenAI tool adds that some models are restricted to vetted partners such as governments, researchers, or enterprise clients. “That’s usually about safety, misuse risk, or early testing, not because they’re exclusively deployed to fight cyber crime,” it says.

Double-edged sword

Karanasios and Akhlaghpour say Mythos highlights a shift. They describe it as a double-edged sword: it could help organisations uncover hidden flaws but also raises the risk that attackers could do the same first.

“Mythos and other AI models like it could change the basic economics of cyber security,” say Karanasios and Akhlaghpour.

The authors say this raises questions about “who gets access to powerful AI models, who oversees their use, and who decides what counts as the ‘right hands’.”

Regardless of the intention behind holding back models, Muller says this “signals a shift toward treating advanced AI as critical infrastructure rather than just software”.

“The reality is a bit less cloak-and-dagger than it sounds,” ChatGPT says. There is no secret “elite anti-cyber crime AI,” but more powerful applications exist behind the scenes, shaped by context, data access and controlled deployment.

Share