← Blog

Why Does AI Forget What You Told It?

Published 2026-07-04 · Updated 2026-07-04

An AI chatbot forgets because it has no memory of its own. Each time you send a message, the model re-reads the entire conversation from scratch, but only as much as fits inside a fixed limit called the context window. Once the conversation grows past that limit, the oldest parts are dropped, and the model can no longer see them.

That single fact explains nearly every instance of apparent forgetting. The model is not being lazy or ignoring you. Between your messages, it retains nothing. What looks like memory is the model reading a transcript handed to it fresh on every turn.

Context windows: the model's working memory

The most useful way to picture a context window is as working memory, comparable to the small amount of information you can hold in mind while doing mental arithmetic.

Everything the model has available about your current conversation must fit inside this window: your messages, its own replies, any system instructions, and any documents you pasted in. It is not permanent storage. It is a fixed-size buffer, and when it fills, something has to be removed to make room.

This is why the model can discuss a 500-word email flawlessly but starts losing track during a long, rambling chat. The email fits comfortably within the window. The three-hour conversation does not.

Modern models have large windows, but they remain finite. Anthropic's current Claude models, for example, offer context windows of either 200,000 tokens or 1,000,000 tokens depending on the model, according to Anthropic's model documentation. A 1,000,000-token window is substantial, roughly 555,000 words by Anthropic's own estimate, but a long working session with pasted files and extended back-and-forth can still approach it.

What is a token?

The term "token" recurs throughout this topic, so it is worth defining.

A token is a chunk of text, usually a bit smaller than a word. Common short words are often a single token, while longer or unusual words are split into several pieces. Spaces and punctuation count as well.

For a rough sense of scale, OpenAI's documentation gives a rule of thumb: for English text, one token is approximately four characters, or about 0.75 words, per OpenAI's developer documentation. So 1,000 tokens is roughly 750 words.

This matters because the context window is measured in tokens, not messages. The model does not count how many things the user said; it counts total tokens. A conversation full of long pasted articles fills the window far faster than a chat of short one-line questions, even when both contain the same number of messages.

What happens when the conversation gets too long

When a conversation exceeds the context window, the system has to decide what to keep. There are two common strategies, and the application you are using may adopt either one.

The first is truncation: the oldest messages are dropped so the recent ones still fit. If you told the AI your name at the very start of a long session, that introduction may have quietly fallen out of the window. The model is not refusing to remember it; the text is no longer available to it.

The second is summarization: instead of deleting old messages outright, some applications compress them into a shorter summary and keep that summary in the window. This preserves the gist but loses specifics. A summary might retain "the user is planning a trip to Japan" while dropping the exact dates, the hotel name, and the budget spelled out earlier.

Either way, the effect is the same from your side. Details from early in a long chat grow thinner or vanish, while recent details stay sharp.

Why a brand-new chat starts blank

Opening a new conversation resets the window to empty. The model does not carry anything over from your previous chat, because from its perspective nothing was ever stored. Each conversation is independent.

This surprises people who expect the AI to remember them the way a person would. But unless the product has a specific, separate memory feature (covered below), yesterday's chat is simply unavailable to today's conversation. The model is not choosing to forget you overnight. There was never a place where information about you was kept.

The "lost in the middle" effect

A related question: why does the AI often recall something you said at the very start of a long chat, and your latest message perfectly, but fumble a detail from the middle?

Researchers documented this in a 2023 paper titled "Lost in the Middle: How Language Models Use Long Contexts" by Nelson F. Liu and colleagues. Testing how models use long inputs, they found that performance was typically highest when the relevant information sat at the very beginning or the very end of the context, and dropped noticeably when the model had to pull information from the middle of a long context.

In other words, even when a detail is technically still inside the window, the model attends most reliably to the beginning and the end of the context. A fact positioned in the middle of a long conversation can be present but overlooked, which is difficult to distinguish from forgetting.

This is why restating an important point near the end of a long chat often resolves the problem: it moves the fact from the middle of the context to the end, where the model attends most reliably.

Three different kinds of "knowing"

Much of the confusion clears once you separate three things the AI draws on. They are distinct, and they fail in different ways.

Training knowledge is what the model learned during its creation, fixed before you ever arrived. It is why the model knows general facts and how to write. It is broad but frozen: it does not include your specific conversations, and it has a cutoff date, so it will not know recent events on its own.

Context is the live conversation in the window right now, the subject of this article. It is specific to you and current, but temporary. It is lost when the window fills or the chat ends.

Persistent memory is a separate feature some applications add on top. It saves selected facts about you (your name or your preferences, for instance) outside the conversation and re-inserts them into the context of future chats. This is the only one of the three that can make an AI remember you across sessions, and it exists only if the specific product offers it.

When people say the AI forgot, they have usually reached the limits of context while assuming they were dealing with persistent memory.

Practical tips for working within the limits

You cannot expand the window, but you can work with it deliberately.

  • Restate key facts when they matter. If a detail from earlier is important to the answer you want, repeat it in your current message. This pulls it into the active window and toward the end of the context, sidestepping both truncation and the lost-in-the-middle effect.

  • Summarize long chats yourself. When a conversation gets long, ask the AI to summarize the important decisions and facts so far, then continue from that summary. You are doing deliberately, and accurately, what automatic summarization does approximately.

  • Start fresh for new topics. A window cluttered with an unrelated earlier discussion can crowd out and distract from your new question. A clean chat often produces sharper answers.

  • Front-load the important material. Put your key instructions and constraints at the top of a long message rather than in the middle of a large block of text.

  • Keep pasted material lean. Since documents consume tokens quickly, paste only the sections you actually need rather than an entire file. More room in the window means fewer things falling out of it.

  • Use the memory feature if the app has one. If your tool offers a persistent memory setting, enabling it lets it carry a handful of facts across conversations. Check what it is storing, since that is the only path to genuine cross-session recall.

Once you view the AI as a system reading a fixed-size transcript rather than a mind that remembers, its forgetting becomes predictable. It is not a flaw in your prompting. It is the expected behavior of a system with a finite window, and understanding the shape of that window is what lets you use it well.