Friday, 26 June 2026 PDT | 04:27 PM
The 1 News Alt Logo Text Smart News for Global Indians

Distillation

AI News June 26, 2026 02:10 AM
Distillation

Distillation - How Alibaba allegedly siphoned Claude's AI capabilities

Anthropic has directly accused Alibaba of siphoning off the capabilities of its Claude AI. And the craziest part of this whole story is the method that was allegedly used.

Because no, don't worry, nobody hacked Anthropic's servers, nobody stole Claude's source code, and nobody got their hands on the model's famous "weights". In fact, the operators (bots, basically) linked to Alibaba simply chatted with Claude. And not just a little - they carried out 28.8 million exchanges over 6 weeks!!

So you're probably wondering how you "steal" an AI just by talking to it? Well, it's a technique called distillation , and I'm going to try to explain it to you.

When you ask Claude a question, it generally gives you a very well-formulated, thorough answer. And that answer is pure gold for copycats, because it contains - in condensed form - the model's knowledge and reasoning. So if you collect millions of these question-answer pairs, you end up after a while with a massive dataset. And with that dataset, you can then train your own smaller model to imitate the responses of the more powerful one.

Basically, the big model plays the teacher, and your smaller model plays the student. The student doesn't necessarily understand how the teacher thinks, but by copying everything the other one says, it ends up looking a lot like it. Researchers call this the teacher-student technique, and the variant used here by Alibaba is "black-box" distillation. In black-box mode, there's no need to crack the model since its responses are enough. And that's why it works even when the model on the other side is closed and only accessible via an API.

There's just one small detail though… No API in the world is going to let you casually fire off 28 million requests from a single account. There are quotas, rate limits, anti-abuse systems everywhere. So they had to create around 25,000 fake accounts to muddy the waters - that way each account does its little share of the work, the traffic looks like thousands of regular users, and boom, they grab the data without anyone noticing! It's this large-scale camouflage that leads Anthropic to say this is the biggest attack of its kind they've ever seen, carried out according to them by operators linked to Alibaba and its lab Qwen [FR] .

And this is far from the first time - remember, back in February, Anthropic had already spotted the same scheme at DeepSeek [FR] (150,000 exchanges), Moonshot AI (3.4 million) and MiniMax (13 million). Before that, at the start of 2025, OpenAI was already suspecting DeepSeek of dipping into its models' responses, and described hidden third-party routers being used to bypass its blocks. In short, it's always the same pattern. AI copying even has its own in-house variants, as we saw with that Pangu scandal at Huawei [FR] , which stayed between Chinese players.

And the real problem for Anthropic, OpenAI, and the others is that there's almost nothing they can do about it. An AI's product is precisely its responses. You can't sell answers while also preventing people from reading and storing them. The labs are working on countermeasures (output watermarking, rewriting reasoning traces to obscure the trail, that kind of thing), but for now it's all just a temporary patch.

That said, don't go thinking distillation is inherently "dirty". It's an extremely common and perfectly legitimate technique for building small, fast models that run on your laptop. But what changes everything here is consent - distilling your own large model is fine, but quietly distilling your neighbor's via fake accounts is still pretty low.

Now, what I don't forget is that these giant models gorged themselves by swallowing the entire web without asking anyone's permission - so seeing them get pumped in turn to end up in open source models, I can't help but see that as a bit of karmic justice...

We'll see what the courts make of it...