Base Models vs Post-Training: What Each Layer Does
Base models are broad, raw text learners, while post-training adds an instruction-driven layer that greatly increases usefulness but can lead to overfitting, so the takeaway is to balance raw capabilities with careful post-training and prompt design.
One of the fun things about working on models back in the early part of this decade was getting to play with a base model.
Put simply, the first models we had—think GPT-2 and GPT-3—were basically just models trained on an entire corpus of text. And by that I mean whatever text you could put in there. It wasn’t even really filtered that much; the text was just fed into the model, the model learned from it, and then you went in with prompts and tried to see what it could do.
That was the magic. That was the miracle.
I could take a bunch of text—there are examples of this now with just a few hundred lines of code—feed it literally random text: Wikipedia, Reddit, open-source data, public domain books, what have you. And on the other side of it, you’d get a model you could ask questions to, and it could have conversations, without you ever explicitly telling it that it was a chatbot in training.
I might tell it in a prompt, “Pretend you’re a chatbot.” And that’s it. Just those four words: pretend you’re a chatbot. And magically, it became a chatbot.
This was why so many of us were so excited to be playing with these models: they weren’t trained to do anything specific. You could ask it to translate from English to Japanese and it would do it. You could ask it to try math problems it couldn’t have been explicitly trained on, and it could still figure out how to do them. It’s easy to forget where we started—with these raw base models that, by themselves, were intelligent.
The problem was that they were a challenge to use. You had to be a prompt engineer to get really good results out of them, which made me feel very special being the first person hired as a prompt engineer—being the internal prompt whisperer, the person people went to when they wanted to ask, “Hey, how do you do this?”
But that’s also a terrible situation for the rest of the world. It’s not efficient to just ask Andrew.
Thankfully, there was a way to improve these models. But that improvement also came with a cost.
When people talk about a model being a base model, they mean the thing you get when you take that big roll of text, run it through the algorithms, and end up with a model that tries to predict: “If I say this, what comes next?”
You can give it prompts and examples. You can say things like:
Hey, you are a professor at a leading research university in physics. Your student asked you this question…
And it’ll give you a great answer. That’s amazing.
But the problem is: you want to get rid of the prompting.
Instruct models and post-training
So what happened is that OpenAI first created what they called the instruct models.
The workflow looked like this:
- Train the base model on lots of text.
- Do what’s called post-training.
Post-training is where you give the model a bunch of instructions and examples of how it should behave.
The first example most of the public saw of this was ChatGPT.
ChatGPT was the GPT-3.5 model, then given a huge number of examples where it was told, essentially:
- If you’re the assistant and a user asks you a question…
- And the user says, “What color is the sky?”
- The assistant says, “Blue.”
Millions of Q&A examples like that. That became the final step of training.
So the model learned: “Oh—this is what I’m supposed to do. They ask me questions, I answer. They’re the user, I’m the assistant, and this is how I process things.”
But underneath all of that is still the base model. You take the base model, then you add this other layer on top.
The tradeoff: more useful… and also easier to mess up
What’s happened since then is that the post-training layer has extended tremendously.
We started teaching models skills they maybe didn’t learn as well in the base phase. We give them more structured data like code and all kinds of other things, and we try to make them much more capable. That has made them incredibly useful.
The pro and the con is: you can overfit a model.
Overfitting is when the model gets trained too heavily in post-training. Sometimes you’ll see a model come out that scores really well on leaderboards, but it’s not actually that good to use, because it was over-optimized in post-training for whatever those benchmarks were measuring.
All of which is to say: every time you sit down and talk to a model (or build on a model), you have to remember that most of the models we use right now are base models underneath.
Why prompt “jailbreaks” work
This is also why clever prompt engineers are able to “break” models and get them to do things outside of their post-training.
That training came later. And sometimes you can get the model to assume it can ignore that layer, and you can speak to the model below. It’s kind of like a form of hypnosis.
It’s fun to do, but it’s also frustrating for AI labs, because the risk is you have a model you think isn’t going to do a thing—but it does know, in that base layer, how to do that thing.
Maybe it has information about nuclear materials. Maybe somewhere across the internet it encountered data about creating some sort of bioweapon. You don’t want it to provide that, so you do extensive post-training to prevent it.
A lot of the prompts people use to “break” models are basically trying to break through the post-training and get to the base model underneath.
Some people are against that. Some people think you should be able to use models for what you want. But remember: the frontier labs are large corporations. They can find themselves having to answer in front of Congress, and they face a lot more scrutiny than smaller labs. It’s a very different question for them.
What I miss about the old days
I miss being able to explore base models.
The early instruct models were very useful for the average person, but very frustrating to me, because I could get base models to do many more things.
And there were some weird effects that showed up as models got bigger and “better.”
For example: GPT-4 was a much bigger, much more capable model than GPT-3.5, but GPT-4 wasn’t as good at chess as GPT-3.5.
In theory, GPT-4 should have been really good at chess—and probably was at the base model level. Something happened in post-training.
You’ll hear people talk about this when a model regresses. They’ll say, “Hey, this should be better at that.” And often it’s because something in post-training went amiss. It’s not that the model isn’t smart enough. It’s that the finishing school—whatever you want to call it—made it not as good.