Posts
Browse the complete archive of prompting notes in this reading order.
How I Became OpenAI's First Prompt Engineer
Dive deep into an AI frontier, rigorously test and document prompts, and openly share useful findings to stand out and land a pioneering role like OpenAI's first prompt engineer.
The Evolution of Prompts: From Completion to Systems
Prompts have evolved from pattern-based completion to outcome-focused instructions, and the practical takeaway is to provide the simplest, clearest description of the finished product and its success criteria so the model can deliver the desired outcome.
Base Models vs Post-Training: What Each Layer Does
Base models are broad, raw text learners, while post-training adds an instruction-driven layer that greatly increases usefulness but can lead to overfitting, so the takeaway is to balance raw capabilities with careful post-training and prompt design.
Model Identity and Statelessness: Why Explicit Context Matters
LLMs are stateless and may not know their own identity unless explicitly provided in prompts or post-training guidance, and larger context windows make it easier to supply that metadata upfront.
The Stateless AI Guessing Game: A Prompting Lesson in Memory
Stateless models don’t remember between turns, so you can still play a guessing game by encoding the chosen object into the transcript, such as base-10 encoding or a foreign-language rendering, to persist it across questions.
GPT Demo Set List: Early Prompt Patterns That Still Hold Up
Curated prompts from a GPT-3 demo set reveal practical capabilities: token-based world view, autocomplete, structured text, translation, summarization, tone and persona control, multi-voice outputs, and turning unstructured text into structured data.
Early Sentence-to-Email Prompts: A Foundational Transformation Pattern
Turn a minimal instruction into a polished email by providing a handful of consistent examples and letting the model complete the pattern, illustrating in-context learning and rapid productization.
GPT-3 Emoji Story Demo: Narrative Compression in Tokens
GPT-3's emoji storytelling demo shows how models compress meaning into simple token choices that render as visuals, revealing how a narrative can be told with emojis and signaling the move from text to visual tokens.
Separating Instruction from Content: A Core Prompt Reliability Pattern
Clearly separate instruction from content with a reliable delimiter (three hashtags often being the strongest) and present structured data (Markdown, XML, or JSON) to reduce ambiguity and improve model performance.
Magic Words in Prompting: Domain Terms That Steer Behavior
Anchor prompts with domain-specific terminology and canonical formats to steer the model toward the desired structure and tone.
Invoking Experts in Prompts: When Persona Framing Improves Results
Invoking an expert persona in prompts steers the model to adopt a relevant reasoning frame, yielding clearer explanations and better solutions.
GPT-3 Grammar and Style Editing in Practice
GPT-3 enables advanced grammar and style edits, tone adjustment, coherence improvements, and format transformations across text without explicit training as a dedicated grammar tool.
Magic Phrases for Moderation: Prompt Patterns That Improve Safety Calls
Use standardized prompts and rating frameworks (like ESRB) along with explicit guidelines and practical examples to achieve more consistent, scalable AI-driven content moderation.
GPT-3 for Regex, Bucket Policies, and Solidity Tasks
GPT-3 can convert tedious, syntax-heavy tasks into actionable tooling by generating regex patterns from plain English, crafting precise bucket policies, and explaining or auditing Solidity contracts.
Rethinking best_of in GPT-3: Why It Misleads
Relying on best_of to improve LLM accuracy is misguided; the practical fix is to define clear task boundaries with better prompts and use outlier examples to ground interpretation, which can let you use smaller models and single-shot prompts while reducing cost.
The Fifth-Grade Summary Moment: Audience-Aware Compression
Generative summarization creates original, audience-tailored explanations rather than mere extracts, so specify the target reader and evaluate quality by usefulness to that audience.
Mini Prompts for Trick Questions and Nonsense Inputs
A brief upfront prompt tells the model to distinguish serious questions from nonsense or trick questions and to respond appropriately.
Prompts to Reduce Hallucinations: Practical Control Patterns
Teach models to say 'I don't know' when unsure by labeling truthful, false, and unknown statements, reducing hallucinations and boosting accuracy through prompting and fine-tuning.
Cross-Temperature Hallucination Testing for Sanity Checks
Cross-check AI outputs by comparing responses across temperatures and against smaller models to quickly flag hallucinations and verify with real sources.
Temperature in LLMs Explained: What It Actually Controls
Temperature adds a controlled amount of randomness to LLMs to explore alternative paths rather than boosting creativity, helping to break repetitive outputs but risking nonsensical results at high values and often being unnecessary with modern models.
Seeded Creativity for LLMs: Controlled Randomness That Helps
Generate random seeds outside the model, feed them into prompts, and let the LLM produce varied yet coherent output.
Creating Better Quiz Distractors with LLMs
Crafting plausible quiz distractors is hard; a practical workaround is to use a smaller model with a higher temperature to generate incorrect-but-plausible options, though results can still vary.
Bracketing Letters for Wordle: Token-Level Prompt Control
Token-level input can derail Wordle-like tasks; using a bracketed, character-level representation lets the model track each letter and constraint reliably.
The Missing Bracket: How Tiny Formatting Errors Break Outputs
Ambiguity from a missing closing bracket in a legal passage caused inconsistent model results, showing that thoroughly reading and correcting the input is essential for reliability.
Prompt Repetition and Rephrasing: A Reliability Tactic That Lasts
Repeat or rephrase the prompt by placing it at the top and/or bottom to keep the model anchored and improve consistency on long or complex inputs.
Prompt Size Reduction Checklist: Cut Tokens Without Losing Quality
Use a practical prompt-optimization checklist to reduce token usage by cleaning up examples, cutting verbosity, narrowing labels, and batching multiple classifications in a single API call for faster, cheaper results.
Small Model Advantages: When Smaller LLMs Outperform Bigger Ones
For large documents, extracting key points, phrases, and entities with a small model is cheaper, faster, and often more reliable than generating a full summary.
Small Models, Big Knowledge: Prompting Past the First Guess
Smaller language models aren’t inherently dumb; their true potential shows when prompts steer retrieval away from easy generalizations, unlocking non-obvious knowledge and cutting costs.
Using Small Models for Complex Natural-Language Tasks
Thoughtful prompting and lightweight schemas let small language models reliably convert flexible natural-language input into structured data for real-world tasks like scheduling, at a fraction of the typical cost.
Large Text Pattern Analysis with Prompted Models
Feed large batches of text into a single context window to extract overall patterns and sentiment across many posts, enabling scalable, non-sequential analysis while monitoring for hallucinations.
Prompt Maker: How to Teach Prompt Patterns by Example
Teach prompts by presenting models with a consistent, example-rich pattern so they infer the task and generate high-quality new prompts.
Compute at Scale: Growth, Limits, and AI Demand
Compute needs will rise with human ambition, potentially by about 1,000× today, and will be met through strategic, highway-like infrastructure expansion and smarter use rather than chasing unlimited physical limits.
How Small Can AI Be? Practical Limits and Opportunities
Smaller, compressed AI models trained on task-specific data can be genuinely useful on ordinary hardware, enabling distributed, cooperative intelligence rather than relying solely on ever-larger models.
Context as an AI Lever: The Compounding Effect of Longer Windows
Expanding context length unlocks new capabilities, enabling reliable handling of long documents, deeper reasoning, and more practical AI tasks.
Small Capabilities, Big Ramifications in Prompt Design
Expanding capabilities such as larger context windows and structured representations like arrays unlock significant practical gains, enabling handling of large codebases and the creation of more complex games.
Scaffolding Long-Form Content: Prompt Patterns for Coherence
Break long-form writing into small, solvable steps, then progressively expand with more scenes, motivations, and reversals to produce a complete piece.
Radio Play Scaffolds: A Better Prompt Pattern for Story Generation
Use a radio-play scaffold with a Narrator, Characters, and an optional Editor to structure prompts so the model generates longer, more coherent narratives with clear direction.
Building AI Choose-Your-Own Adventures with Prompt Scaffolding
Create AI-driven choose-your-own-adventure experiences by grounding the model with a map, state-tracking, and short scene summaries to preserve continuity and guide branching.
Character-Threaded Summarization for Long Documents
For long texts, build per-entity timelines (characters, locations, key events) and then fuse them into a coherent final summary to preserve reversals and changing perspectives.
Memory in Conversational AI: Why Context Persistence Matters
Equipping conversational AI with memory of past interactions creates coherent, context-aware dialogue and improves personalization beyond single-turn prompts.
Outcome-Oriented Prompting: Define Success, Then Generate
Shift prompting from instructing the start to defining verifiable outcomes and success tests, then use reasoning-enabled models to draft, evaluate, and iterate until the result meets objective criteria.
Style Guides for AI Writing: Getting a Specific Voice
To get AI to write in a specific voice, first have it analyze and articulate the target style, then prompt it to write using that explicit style guide.
Crystallized vs Fluid Intelligence in Language Models
Distinguish crystallized intelligence (memory of facts) from fluid intelligence (generalization) in language models and tailor evaluation and training to balance recall with robust reasoning.
Personal AI Evaluation Methods for Real-World Quality
Design and run your own diverse, task-specific evaluation suite to gauge AI model improvements beyond benchmarks, tailoring tests to your real use case and including multi-modal reasoning.
Understanding Embeddings for Better Prompting and Retrieval
Embeddings are high-dimensional word representations that encode multiple relational axes, and choosing prompt words that sit in the right regions of that space can steer model behavior more effectively than lengthy instructions.
Embedding-Based Retrieval Strategies That Actually Work
Embeddings are learned, high-dimensional representations used for retrieval, and the practical takeaway is to standardize and synthesize documents into retrieval-optimized representations rather than embedding raw text.
Context vs Retrieval: A Practical Decision Framework
Use a cost-driven framework to decide whether to put data in the prompt, retrieve it via keywords or embeddings, or fine-tune, guided by a spreadsheet that compares input/output costs and time investment.
Grounding Prompts with Wikidata and SPARQL
Ground model outputs in Wikidata by constructing SPARQL queries with correct property and entity IDs, optionally aided by a lightweight query generator or retrieval workflow, to fetch real data and reduce hallucinations.
The Prompt Context Flywheel for Continuous Improvement
Periodically mine conversations, have an LLM propose updated prompts that reflect current context, and deploy the improved prompt as a living prompt context flywheel—either in production or via shadow testing—to steadily improve responses.
Fine-Tuning Fundamentals: When to Use It and When Not To
Fine-tuning is a final option after prompting and RAG, chosen for memorization of facts or generalization of behavior, with practical steps to test on small models first and format data accordingly (facts in the assistant message; behavior in user/assistant pairs) before scaling.
Fine-Tuning Methods Guide: SFT, DPO, and Beyond
Fine-tuning is a toolbox of SFT, DPO, reinforcement fine-tuning, and vision fine-tuning; pick the method by your goal (memorization vs generalization, explicit behavior, reasoning with graders, or robust augmentation) rather than defaults.
Cost Savings via Fine-Tuning Smaller Models
Fine-tune a smaller model on high-quality examples derived from a larger model to preserve performance while substantially lowering per-call costs, with potential to step down to even smaller models as you scale the dataset.
Model-Assisted Data Preprocessing for Better Fine-Tuning
Leverage an auxiliary model to preprocess, standardize, and enrich your training data before training, yielding cleaner, more consistent, and more informative data.
GPT Tools: Fast Prototypes, Real Constraints, and Shipping
OpenAI’s rapid progress rested on human coordination and practical tooling—such as a token counter and a four-model comparison report—more than perfect code.
Discovering Useful Libraries with AI Coding Prompts
Asking models to solve coding problems surfaces unfamiliar libraries and tools, often revealing ready-made solutions you can reuse in projects.
Code Refactoring with GPT-3: Practical Prompt Patterns That Work
Code-capable language models can automatically refactor entire codebases and translate code between languages to deliver faster, more efficient implementations.
Tool Makers vs Tool Users: Where Product Value Actually Lives
The key is removing friction and focusing on user usability, not just expanding capability, to achieve real AI adoption.
Lessons from an Ambitious AI Build
Tackling a truly ambitious AI build forces intense, hands-on learning in prompt design, tool usage, and system design tradeoffs, yielding practical, scalable know-how for real AI apps.
Why I Didn't Launch AI Channels
High costs and slow response times with GPT-3 made AI Channels impractical as a consumer product, so I prioritized learning and joined OpenAI instead of launching.
Hackathons and Model Capabilities: What Fast Experiments Reveal
Hackathons and collaborative prompt exploration reveal a model's wide range of capabilities—from diagrams and spreadsheets to SVGs, STL files, 3D scenes, and mini apps—demonstrating practical ways to surface and showcase AI skills.
GPT-4 Vision Refrigerator Demo: A Practical Multimodal Moment
A fridge photo serves as a simple, human-centered demo to show GPT-4's multimodal understanding and practical usefulness.
Localization Techniques for Vision Models in Real Workflows
Improve localization in vision models by combining prompting strategies (order of description), grid-based coordinates, and tiled, coarse-to-fine analysis, optionally using segmentation to isolate objects.
Vision Models at the Frontier: What Changed and Why
Vision and video models are the AI frontier, capable of learning from images and sequences to reason about the real world, with synthetic data and multimodal prompts as practical levers.
Big and Small Models in Robotics: A Hybrid Architecture
Adopt a layered, multi-model architecture in robotics that pairs large, high-level models for complex reasoning with fast, specialized models for real-time perception and control, with coordinated handoffs to balance latency, capability, and safety.
The Uneven AI Frontier: Why Capabilities Arrive Jagged
Capabilities often arrive in messy, frame-by-frame forms rather than polished breakthroughs, so valuable insights come from imperfect experiments that hint at real potential.
The Frontier Is Wider Than It Looks
The frontier is wider than ever, and the key takeaway is to invest in reasoning-based prompting and a middle-layer classification to guide answers, enabling safer, cheaper, and more reliable AI.
Challenging AI Paper Claims with Practical Replication
Bold claims of AI limitations are often training artifacts in a fast-moving field; treat them as testable hypotheses and verify by re-running experiments with varied data formats so the model learns relationships in its outputs, not just the prompts.