Model Identity and Statelessness: Why Explicit Context Matters
LLMs are stateless and may not know their own identity unless explicitly provided in prompts or post-training guidance, and larger context windows make it easier to supply that metadata upfront.
An often overlooked fact about working with LLMs is how little an LLM may know about itself.
When you train a model, you basically take a collection of data—whether it’s raw text or specially formatted data—run it through an algorithm, and then you end up with the final product. But when you end up with that final product, it’s not necessarily “aware” of what it is. It’s basically just the final stage of creating an algorithm that predicts patterns based on whatever data it was trained on.
So if you ask a model what model it is or who made it, unless it was explicitly told that in a post-training session or in the prompt, it won’t necessarily know.
This creates a sort of confusion for people because we forget that these models are stateless, not stateful.
Stateless in the sense that if I use GPT-3 today, nothing has changed about it since the day it was completed. Maybe I’m using a later checkpoint that was trained differently, but the point is: every time I access the model, it “wakes up” for that time and answers a question. Its idea of the world is whatever the data it was trained on told it.
And unless you explicitly tell it in a prompt that the date is April 2026, it might think it’s the year 2020—because that’s all it “knows” from the data it has. Same thing if you ask it what model it is.
This becomes very apparent when you look at some of the Chinese models that were trained by using outputs from American models. DeepSeek, for the longest time, would insist that it was ChatGPT or GPT-4. And if you asked it for an API, it would give you the OpenAI API, which was kind of funny—and it also revealed the fact that they had trained heavily on models from OpenAI.
Models are better about this now for a couple of reasons.
One, they can handle much larger prompts. So after a model finishes training, the system prompt can provide information about the model—its knowledge cutoff date, what model it is, who made it, etc.—and then the model should be able to answer that.
Two, they can be trained in post-training with instructions so they understand what they are.
So you see less of this as an issue now than you did before. And it’s one of the reasons why increasing the context window size (the amount you can fit into the prompt) had such a big impact. When models were much smaller—like only 2,000 tokens—spending a few hundred words just to establish the model’s identity and metadata was kind of wasteful. Now, that’s not really a problem.