Loading learning content...
Large language models have introduced a remarkable new programming paradigm: prompting. Rather than writing code that explicitly defines behavior, we write natural language instructions that the model interprets and executes. The same model, with different prompts, can be a translator, a coding assistant, a creative writer, or a logical reasoner.
Even more striking is in-context learning: the ability of LLMs to learn new tasks from a few examples provided in the prompt, without any parameter updates. Show a model three examples of English-to-French translation, and it can translate a fourth. This emergent capability—discovered rather than designed—has transformed how we interact with AI.
Prompting is not merely an interface concern. The difference between a poor prompt and an excellent one can mean the difference between a useless response and a brilliant insight. Understanding prompting is essential for anyone working with modern language models.
This page covers the art and science of prompting: from basic prompt structure to advanced techniques like chain-of-thought reasoning, retrieval-augmented generation, and prompt optimization. You will understand how to extract maximum capability from any language model.
At its core, prompting provides context that guides the model's next-token prediction. Understanding the mechanics helps explain why certain techniques work.
When you send a prompt to an LLM:
The prompt establishes the "context" in which generation occurs. Everything the model knows about what should come next derives from this context.
Think of the prompt as conditioning the model's output distribution:
$$P(\text{response} | \text{prompt}) eq P(\text{response})$$
Different prompts activate different "modes" of the model by constraining the likely continuations.
| Component | Effect on Distribution | Example |
|---|---|---|
| System prompt | Sets overall role/behavior mode | "You are a Python expert..." |
| Instructions | Constrains task and format | "Explain in 3 bullet points..." |
| Examples | Defines input-output pattern | "Input: 2+2, Output: 4" |
| Context | Provides relevant information | "Given the following article..." |
| Query | Specifies current request | "What is the main argument?" |
A well-structured prompt typically includes:
[System/Role Definition]
You are an expert data scientist helping with machine learning tasks.
[Task Description]
Your task is to explain the following concept clearly and concisely.
[Constraints/Format]
Provide your explanation in 3 paragraphs:
1. Intuitive overview
2. Technical details
3. Practical implications
[Examples] (optional)
Example:
Concept: Overfitting
Explanation: [detailed example explanation]
[Input]
Concept: Regularization
[Output Primer] (optional)
Explanation:
The output primer (partial response to complete) is particularly powerful—it forces the model to continue in the specified format.
Information at the beginning and end of prompts has the strongest influence due to attention patterns. Critical instructions should appear early (strong attention) and be repeated at the end (recency effect). Long contexts can cause 'lost in the middle' effects.
In-context learning (ICL) is the ability of LLMs to learn new tasks from examples in the prompt without gradient updates. This is a remarkable emergent capability of scaled models.
Zero-Shot: Task description only, no examples
Translate the following sentence to French:
"Hello, how are you?"
Few-Shot (1-5 examples): Task demonstrated through examples
Translate English to French:
English: Hello
French: Bonjour
English: Thank you
French: Merci
English: How are you?
French:
Many-Shot (10-100+ examples): Extensive demonstration
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
# Zero-shot classificationzero_shot_prompt = """Classify the sentiment of the following movie review as Positive or Negative. Review: "This film was an absolute waste of time. The acting was wooden and the plot made no sense." Sentiment:""" # Few-shot classificationfew_shot_prompt = """Classify movie review sentiment as Positive or Negative. Review: "I loved every minute of this masterpiece!"Sentiment: Positive Review: "Boring and predictable. I wanted my money back."Sentiment: Negative Review: "An incredible journey that left me in tears. Must see!"Sentiment: Positive Review: "This film was an absolute waste of time. The acting was wooden and the plot made no sense."Sentiment:""" # Many-shot with structured examplesdef create_many_shot_prompt(examples: list[dict], query: str) -> str: """ Create a many-shot prompt from labeled examples. """ prompt = "Classify movie review sentiment. " for ex in examples: prompt += f"Review: {ex['text']}Sentiment: {ex['label']} " prompt += f"Review: {query}Sentiment:" return prompt # Research finding: Few-shot often outperforms zero-shot significantly# But adding more examples has diminishing returns after 5-10# Quality and diversity of examples matters more than quantityThe mechanism behind ICL is an active research area. Leading theories:
1. Implicit Fine-tuning Attention layers perform something analogous to gradient descent in their forward pass. Examples in context create "meta-gradients" that temporarily adjust behavior.
2. Task Recognition The model recognizes the task from examples and retrieves relevant pre-trained capabilities. Examples serve as a "task identifier" rather than training data.
3. Induction Heads Specialized attention patterns (induction heads) copy patterns from earlier in context. [A][B]...[A] → [B]. Examples establish [input][output] patterns that are copied.
Empirical Findings:
| Finding | Implication |
|---|---|
| ICL emerges at scale (~10B+ params) | Smaller models don't reliably do ICL |
| Example format matters more than content | Correct structure > correct labels |
| Label space defined by examples | Novel labels OK if exemplified |
| Performance varies with example selection | Random selection suboptimal |
ICL cannot teach fundamentally new capabilities—it can only activate capabilities the model already has from pre-training. If the model can't do a task zero-shot (even poorly), examples won't help. ICL also consumes context, reducing space for actual content.
Chain-of-thought (CoT) prompting is one of the most important prompting techniques, dramatically improving performance on reasoning, math, and multi-step tasks.
Standard prompting asks for an answer directly:
Q: Roger has 5 tennis balls. He buys 2 more cans of 3 balls each.
How many balls does he have now?
A: 11
Chain-of-thought prompting asks the model to show its work:
Q: Roger has 5 tennis balls. He buys 2 more cans of 3 balls each.
How many balls does he have now?
A: Roger starts with 5 balls. He buys 2 cans with 3 balls each.
So he gets 2 × 3 = 6 new balls.
Total: 5 + 6 = 11 balls.
Why it works: By generating intermediate steps, the model:
12345678910111213141516171819202122232425262728293031323334353637383940414243
# Standard promptingstandard_prompt = """Q: A farmer has 17 sheep. All but 8 run away. How many are left?A:"""# Model often outputs: "9" (wrong! Answer is 8) # Zero-shot Chain-of-Thoughtzero_shot_cot_prompt = """Q: A farmer has 17 sheep. All but 8 run away. How many are left?A: Let's think step by step."""# Model: "Let's think step by step. 'All but 8 run away' means that 8 sheep # remain. So the farmer has 8 sheep left." # Few-shot Chain-of-Thoughtfew_shot_cot_prompt = """Q: There are 15 trees in the grove. Grove workers plant trees today. After they are done, there will be 21 trees. How many trees did they plant?A: There are 15 trees originally. Then there were 21 trees after planting. So they planted 21 - 15 = 6 trees. The answer is 6. Q: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?A: There are originally 3 cars. 2 more cars arrive. So there are 3 + 2 = 5 cars. The answer is 5. Q: A farmer has 17 sheep. All but 8 run away. How many are left?A:"""# Model produces correct step-by-step reasoning # Self-Consistency: Sample multiple chains, take majority votedef self_consistent_cot(model, prompt, n_samples=5, temperature=0.7): """ Generate multiple reasoning chains and vote on the answer. """ answers = [] for _ in range(n_samples): response = model.generate(prompt, temperature=temperature) answer = extract_final_answer(response) answers.append(answer) # Majority vote from collections import Counter return Counter(answers).most_common(1)[0][0]Self-Consistency
Generate multiple reasoning chains (with temperature > 0), extract answers, return majority vote:
| Method | Accuracy (GSM8K) |
|---|---|
| Standard prompting | 18% |
| CoT prompting | 56% |
| CoT + Self-Consistency (10 chains) | 74% |
Tree of Thought (ToT)
Explicit search over reasoning paths:
Least-to-Most Prompting
Decompose before solving:
Analogical Reasoning
Recall relevant examples before answering:
First, recall a similar problem you know how to solve.
Then, use that approach for the current problem.
CoT provides the largest gains on multi-step reasoning: math word problems, logic puzzles, code debugging. For simple retrieval ("What is the capital of France?") or pattern matching, CoT may not help and can even hurt by introducing errors in the chain.
Effective prompting combines principles from cognitive science, software engineering, and empirical experimentation.
Vague prompts produce vague outputs. Be explicit about what you want:
Weak:
Write about machine learning.
Strong:
Write a 500-word blog post explaining gradient descent to software engineers
who know programming but not ML. Include a code example in Python.
Tone: Technically accurate but accessible.
Defining a role activates relevant knowledge and behavior:
You are a senior software architect at a Fortune 500 company
with 20 years of experience in distributed systems.
Review the following system design and identify potential issues.
Effective roles:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
# Pattern 1: Structured Outputstructured_output_prompt = """Extract information from the following job posting.Return your answer as JSON with these fields:{ "title": "job title", "company": "company name", "location": "location or 'Remote'", "salary_min": number or null, "salary_max": number or null, "required_skills": ["skill1", "skill2", ...]} Job posting:###{job_posting_text}### JSON:""" # Pattern 2: Step-by-Step with Verificationverification_prompt = """Task: Determine if the following code has any bugs. Approach:1. First, describe what the code is intended to do2. Trace through the code with a sample input3. Identify any potential issues4. For each issue, explain why it's a problem5. Finally, state: BUGS FOUND or NO BUGS FOUND Code:```python{ code }``` Analysis:""" # Pattern 3: Perspective Promptingperspective_prompt = """Analyze this business proposal from three perspectives: 1. **Optimist**: What could go well? Best-case scenarios?2. **Pessimist**: What could go wrong? Risks and weaknesses?3. **Pragmatist**: Most likely outcome? Key uncertainties? For each perspective, provide 3-4 specific points. Proposal:{proposal_text} Analysis:""" # Pattern 4: Self-Critiqueself_critique_prompt = """{initial_response} Now critically review your response above:1. What assumptions did you make?2. What might be incorrect or oversimplified?3. What important considerations did you miss?4. Provide an improved response incorporating these critiques. Improved response:"""Prompting is empirical. Small changes can dramatically affect output. Test systematically: vary one aspect at a time, evaluate on diverse examples, keep a log of what works. The best prompt is the one that works on your data, not the one that looks most clever.
LLMs have knowledge cutoffs and can hallucinate facts. Retrieval-Augmented Generation (RAG) addresses this by providing relevant documents in the prompt.
User Query → Retriever → Relevant Documents → LLM → Response
↓
[docs injected into prompt]
Components:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
from sentence_transformers import SentenceTransformerimport faissimport numpy as np class SimpleRAG: def __init__(self, documents: list[str], llm_client): self.documents = documents self.llm = llm_client # Initialize embedder self.embedder = SentenceTransformer('all-MiniLM-L6-v2') # Create document embeddings self.doc_embeddings = self.embedder.encode(documents) # Build FAISS index dimension = self.doc_embeddings.shape[1] self.index = faiss.IndexFlatL2(dimension) self.index.add(self.doc_embeddings.astype('float32')) def retrieve(self, query: str, k: int = 3) -> list[str]: """Retrieve k most relevant documents for query.""" query_embedding = self.embedder.encode([query]) distances, indices = self.index.search( query_embedding.astype('float32'), k ) return [self.documents[i] for i in indices[0]] def generate(self, query: str, k: int = 3) -> str: """Generate response using retrieved context.""" retrieved_docs = self.retrieve(query, k) prompt = f"""Answer the question using the provided context.If the context doesn't contain relevant information, say so. Context:{chr(10).join(f'[{i+1}] {doc}' for i, doc in enumerate(retrieved_docs))} Question: {query} Answer (cite sources using [1], [2], etc.):""" response = self.llm.generate(prompt) return response # Usagerag = SimpleRAG(documents=my_document_corpus, llm_client=llm)answer = rag.generate("What are the symptoms of condition X?")| Decision | Options | Tradeoffs |
|---|---|---|
| Chunk size | 100-2000 tokens | Smaller = precise, larger = more context |
| Chunk overlap | 0-50% | More overlap = redundancy but fewer boundary issues |
| Number of chunks (k) | 1-20 | More = more context, but dilutes relevance |
| Retrieval method | Dense, sparse, hybrid | Dense = semantic, sparse = keyword, hybrid = both |
| Reranking | None, cross-encoder | Improves relevance but adds latency |
Query Transformation
Hierarchical Retrieval
Self-Reflective RAG
RAG can fail silently: if retrieved documents are irrelevant, the LLM may incorporate them anyway, producing confident-sounding nonsense. Always include instructions for the model to indicate when context is insufficient, and consider confidence calibration.
Beyond basic prompting, advanced techniques enable complex, multi-step AI applications.
Modern LLMs can be taught to invoke external tools:
tools = [
{
"name": "calculator",
"description": "Perform mathematical calculations",
"parameters": {
"expression": {"type": "string", "description": "Math expression"}
}
},
{
"name": "web_search",
"description": "Search the web for current information",
"parameters": {
"query": {"type": "string", "description": "Search query"}
}
}
]
prompt = """
You have access to the following tools: {tools}
To use a tool, respond with: <tool>tool_name(param=value)</tool>
Question: What is the current stock price of AAPL multiplied by 1.15?
"""
The agent loop:
Maintaining context across turns:
conversation = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning?"},
{"role": "assistant", "content": "Machine learning is..."},
{"role": "user", "content": "How does gradient descent work in it?"}
]
# Context window limits require summarization or retrieval
def manage_context(conversation, max_tokens=4000):
"""Keep recent turns, summarize older context."""
if estimate_tokens(conversation) > max_tokens:
# Summarize older turns
summary = llm.summarize(conversation[1:-4]) # Keep system + recent
return [
conversation[0], # System
{"role": "system", "content": f"Previous context: {summary}"},
*conversation[-4:] # Recent turns
]
return conversation
Combines chain-of-thought reasoning with tool use:
Question: What is the hometown of the current president of France's population?
Thought: I need to find who the current president of France is.
Action: web_search(query="current president of France 2024")
Observation: Emmanuel Macron is the current president of France.
Thought: Now I need to find Macron's hometown.
Action: web_search(query="Emmanuel Macron hometown")
Observation: Emmanuel Macron was born in Amiens, France.
Thought: Now I need to find the population of Amiens.
Action: web_search(query="Amiens France population")
Observation: Amiens has a population of approximately 135,000.
Thought: I now have all the information needed.
Answer: Amiens, the hometown of French President Emmanuel Macron,
has a population of approximately 135,000.
Agentic systems compound LLM unreliability. A 90% accurate model making 10 sequential decisions has only 35% chance of all being correct. Build in verification, fallbacks, and human oversight for high-stakes applications.
For production systems, prompts should be systematically optimized rather than hand-crafted.
class PromptEvaluator:
def __init__(self, test_cases: list[dict], metrics: list[callable]):
self.test_cases = test_cases # [{input, expected, ...}]
self.metrics = metrics # [accuracy_fn, f1_fn, ...]
def evaluate(self, prompt_template: str, model) -> dict:
results = []
for case in self.test_cases:
prompt = prompt_template.format(**case)
output = model.generate(prompt)
scores = {m.__name__: m(output, case['expected'])
for m in self.metrics}
results.append(scores)
return aggregate_scores(results)
Evaluation workflow:
| Technique | Description | When to Use |
|---|---|---|
| Manual iteration | Human refinement based on failure analysis | Initial development, understanding failure modes |
| A/B testing | Compare variants on traffic/test set | Production optimization |
| DSPy | Automatic prompt optimization via LLM | Scalable, reproducible optimization |
| OPRO | LLM generates and scores prompt variants | When many iterations affordable |
| Gradient-based | Soft prompt tuning, continuous optimization | When model access available |
DSPy treats prompts as programs that can be automatically optimized:
import dspy
class RAGSignature(dspy.Signature):
"""Answer questions using retrieved context."""
context = dspy.InputField(desc="retrieved documents")
question = dspy.InputField(desc="user question")
answer = dspy.OutputField(desc="answer based on context")
class RAGModule(dspy.Module):
def __init__(self):
self.generate = dspy.ChainOfThought(RAGSignature)
def forward(self, context, question):
return self.generate(context=context, question=question)
# Compile (optimize) the module
from dspy.teleprompt import BootstrapFewShot
teleprompter = BootstrapFewShot(metric=accuracy_metric)
optimized_rag = teleprompter.compile(RAGModule(), trainset=examples)
# Optimized module has learned effective prompts
Key insight: Rather than manually crafting prompts, define the structure (inputs, outputs, modules) and let optimization find the best prompts.
Treat prompts as code: document them, test them, version them, review changes. A 'quick prompt fix' in production can cause cascading failures. Maintain a test suite for your prompts just as you would for software.
Prompting is the primary interface to large language models—the craft of translating human intent into AI behavior. Mastery of prompting techniques unlocks the full potential of these powerful systems.
Module Complete:
You have now completed the Large Language Models module. From transformer scaling through pre-training objectives, instruction tuning, RLHF alignment, and prompting techniques—you understand the complete modern LLM pipeline.
This knowledge enables you to:
The field continues to evolve rapidly, but these fundamentals provide a foundation for understanding new developments as they emerge.
Congratulations on completing the Large Language Models module! You now understand the complete stack: from transformer scaling and pre-training through alignment with RLHF and practical deployment via prompting. This knowledge positions you at the cutting edge of machine learning research and application.