Back

About TrainedOnMe

Why we built this

Writers, journalists, researchers, and developers put enormous effort into creating original content — only to have no way of knowing whether that content quietly ended up in an AI model's training set. The companies building these models rarely disclose what they trained on, and there is no straightforward way for individuals to find out.

TrainedOnMe is an attempt to give that power back. It lets anyone paste in text they created and run a statistical test against major AI models to see whether those models show signs of having memorised it.

How it works

We use a method called RECAP (Recitation-based Content Attribution Probing), developed by Duarte et al. at the 2025 conference on AI transparency. The idea is simple: a model that was trained on a piece of text can often continue it from memory — while a model that has never seen it will drift.

We split your content into passages, show each model only the first half of each passage, and ask it to keep writing. We then measure how closely the model's continuation matches what actually comes next, using a standard text-overlap score called ROUGE-L. The more closely the model reproduces the original, the stronger the evidence of memorisation.

You can read the original paper here: RECAP (Duarte et al., 2025).

Limitations

This test can only detect content that was memorised during training. A model may have been trained on your content without strongly memorising it — particularly if your text is short, stylistically generic, or similar to many other documents. A negative result is not proof that your content was never used. It means we could not find a detectable signal.

The test works best with content that is distinctive and long enough to extract several passages (500+ words is a good target). Highly formulaic text — boilerplate, legal language, code — is harder to distinguish from independently-generated text.

Privacy

Your content is never stored by us. To run the test, passages are sent directly to the AI provider APIs (OpenAI, Anthropic, Google) and discarded immediately after scoring. Each provider's own data policy applies to those calls.

Who we are

We are a team at Stanford University working on questions of AI transparency and accountability. TrainedOnMe started as a research side-project and grew into a public tool because we kept getting asked the same question by writers, journalists, and researchers: "Is my work in there?"

If you have questions or want to get in touch, use the contact form.