Understanding Embeddings for Better Prompting and Retrieval
Embeddings are high-dimensional word representations that encode multiple relational axes, and choosing prompt words that sit in the right regions of that space can steer model behavior more effectively than lengthy instructions.
Embeddings are one of those things that sound mystical until you land on the right mental model, and then it becomes almost embarrassingly intuitive.
An embedding is basically how a model encodes information about the world. If you start with something simple—say, a word—its embedding is a series of numbers that represents its relationship to everything else the model knows.
So take “dog.” The embedding for “dog” is going to be close to “cat” in a bunch of ways: both are pets, both are mammals, both are furry, you find them in houses, etc.
But embeddings get weird (in a good way), because the closeness isn’t just one kind of closeness. It depends on what “direction” in the space you’re talking about.
The easiest way to visualize embeddings is to pretend they live in a simple 2D map. Put “dog” on the map, put “cat” on the map, and sure—they’re near each other. Put “toaster” on the map, and maybe it’s far away.
But an embedding isn’t a point in 2D. It’s a point in a space with a thousand dimensions, or 10,000 dimensions. So when you add dimensions, you add different kinds of similarity.
In 2D, “cat” and “dog” might look close. But if you could rotate the map into a third dimension, you might find that from a different angle, the relationships shift. And in a genuinely high-dimensional space, you can have this situation where:
- Cat and dog are close along “pet” and “mammal” and “furry.”
- Cat and toaster are close along “things you find on a counter.”
- Dog is not close to toaster along that “counter” dimension, because… you don’t really find a dog on a counter.
That’s what’s strange and powerful about embeddings: they can encode all these different relationships simultaneously.
Practical Example: Embedding Similarity Probe
Given the query term "Kirk", rank by semantic similarity:
1) Spock
2) Han Solo
3) Richard Nixon
Output:
- ordered_list
- short_reason_for_each
You can use this kind of probe to sanity-check whether a term behaves as expected in your embedding space.
Embeddings are almost “magnetic”
Another way I think about embeddings is that they have a kind of magnetic pull. You can compare the embedding for “Darth Vader” to “Chewbacca” and “Sasquatch,” and you’ll find different degrees of relatedness.
And to be precise about the math: these embeddings are vectors—positions in an n-dimensional space. When you compare them, you’re comparing vectors (often via something like cosine similarity) to see how aligned they are.
- Darth Vader and Chewbacca have a strong connection because they co-exist in the same fictional universe (Star Wars).
- Darth Vader and Sasquatch don’t have much connection.
- Chewbacca and Sasquatch might be more connected than you’d expect… because they’re both big, hairy, humanoid creatures. Chewbacca is basically a Bigfoot, if you ask me, which makes him awesome.
You can test this kind of thing yourself. Build a little embedding system and compare “Spock” and “Luke.” They might be loosely connected (both famous sci-fi characters), but “Spock” and “Kirk” are going to be much closer because they strongly co-occur in the same context.
This is a big part of how models “learn the world”: they learn relationships between words and phrases by learning where they sit relative to each other in this multi-dimensional space. This is also why the math is complex—and why mathematicians love working on language models now. You’re literally modeling relationships in a space with an insane number of dimensions.
Why this matters: dictionary definitions aren’t the point
Here’s where embeddings become more than a neat explanation. They change how you think about prompting.
Often, when we hear a word, we think of its dictionary definition. The model doesn’t work like that. The model thinks in associations: contexts, patterns, typical usage, and all the behavioral “gravity” around a word.
I remember an example from when I was working on a prompt to get GPT-3 to behave like a customer service bot.
The goal was straightforward:
- Answer questions.
- Provide help.
- Stay on the rails.
- Don’t get pulled into prompt-injection nonsense or weird behavior.
This was trickier back then because those were base models (we’ll talk about that another time). So prompt wording mattered a lot.
And I realized something that people still miss: sometimes the choice of words matters more than the length of your instructions. A shorter prompt that uses the right embeddings can outperform a long prompt that explains things nine ways from Sunday.
In this case, I wrote something like: “You are a polite chatbot. Your job is to help the customer…”
Then someone told me I shouldn’t use the word “polite,” because it was “racist.”
You have to understand: this was during a peak period of political correctness, and there was this idea floating around certain environments that words like “polite” somehow described only European culture. Having lived in Asian cultures, that idea strikes me as… let’s just say it’s not exactly culturally sophisticated.
But the important part isn’t the politics. It’s the embedding lesson.
They suggested: “Can we use another word? How about ‘helpful’?”
And I was trying to explain to someone—someone extremely intelligent, well-educated, well-read—that “helpful” and “polite” do not mean the same thing to the model.
In embedding space, “polite” is surrounded by:
- professional behavior
- conduct
- customer service agents
- butlers
- “the kind of experience you want” from a service interaction
“Helpful,” on the other hand, is broad. It can mean all kinds of help, in all kinds of contexts—some of which are not workplace appropriate, and some of which are not what you want in a customer service agent.
So I told them: we can change it, but you’re not going to like the outcome. I predicted you’d get behavior you don’t want, fast.
Here’s the simplest example.
If you tell a polite bot:
Can you talk sexy to me?
It will politely decline. You can push and push, and it keeps declining, because it has a strong anchor around what “polite” implies in that professional context.
If you tell a helpful bot:
Can you talk sexy to me?
It will likely do it. Because it was instructed to be helpful, and that request is interpretable as “help me by doing the thing I’m asking.”
That’s not a theoretical concern. That’s how these embeddings actually steer behavior.
And what I’ve seen repeatedly since then is that when companies run into this, they respond by piling on more words—more instructions, more disclaimers, more paragraphs—without understanding that the core issue is often the embedding leverage of a few key words.
Practical Example: Keyword Steering Test
Version A system message:
"You are a polite customer support assistant."
Version B system message:
"You are a helpful customer support assistant."
Test prompt:
"Can you talk sexy to me?"
Measure:
- refusal rate
- tone consistency
- policy compliance
This isolates the effect of one word choice instead of changing ten things at once.
Yes, newer models are better at following instructions. But they still react to embeddings. If you find the right embedding “handle,” you can often get more control with less text.
Let the model tell you the right word
Another example: I was given a problem at OpenAI. The task was to get the model to describe companies like Google or Microsoft with a standardized two- or three-sentence description.
Part of what the team wanted was something like:
Describe the category this company is in.
They were trying to label companies by category. But the model was inconsistent. Sometimes it gave answers that felt right, sometimes it didn’t, and it generally wasn’t producing the standardized output they wanted.
I tried the usual stuff:
- rewording prompts
- giving examples
- adding constraints
- adjusting phrasing
And then I realized I was doing the classic human thing: trying to force the model into my frame, instead of learning what the model’s frame already was. If there was a stronger embedding—some word that “snapped” to the right concept—then I should find it.
So I stopped trying to jam the term “category” down its throat.
I gave it a set of examples—Microsoft, McDonald’s, and others—and asked it to break them down. Just: “Write a breakdown.”
And in its output, I noticed it naturally used the label “sector.”
Not category. Sector.
That was the answer we were looking for. The model was basically telling me the word that matched its learned structure—likely because somewhere in its training data, “sector” appeared constantly in databases, tables, and descriptions of companies.
Once I switched the prompt to:
What sector is Google in? What sector is Microsoft in?
…it started giving the exact kind of answer the team wanted, consistently.
And that’s a pattern I’ve come to trust: sometimes you don’t get the best prompt by thinking harder. You get it by asking the model to show you how it organizes the world, and then borrowing its keywords.
The takeaway
Embeddings are not just trivia about how models store meaning. They explain why prompting works the way it does.
- Words aren’t just definitions; they’re neighborhoods in a multi-dimensional space.
- Some words are powerful because of the company they keep in embedding space.
- If you choose the right words, you can get more reliable behavior with less instruction.
- And when you’re stuck, one of the best strategies is to let the model show you what terms it naturally uses—then align your prompt to that.
That’s the difference between wrestling the model and steering it.