Text Generation: Deterministic Methods

Overview

When language models predict the next token, they output probability distributions over the entire vocabulary. But how do we convert these probabilities into actual text? This lesson explores deterministic generation methods—approaches that make predictable, reproducible choices about which tokens to select from transformer-based language models.

Understanding these foundational methods is crucial before we explore more creative sampling techniques. Think of this as learning to walk before we run: mastering predictable text generation gives us the foundation to understand when and why we might want to introduce randomness in language model outputs.

Learning Objectives

After completing this lesson, you will be able to:

  • Understand the fundamental challenge of converting probabilities to text
  • Implement and explain greedy search generation
  • Implement and explain beam search generation
  • Compare the trade-offs between different deterministic approaches
  • Choose the right method for specific use cases
  • Debug common issues in deterministic generation

The Core Challenge: From Probabilities to Words

The Decision Point

Every time a language model generates text, it faces the same fundamental challenge at each step:

Model output: [0.4, 0.3, 0.15, 0.1, 0.05, ...] Vocabulary: ["the", "a", "an", "this", "that", ...] Decision: Which word do we actually choose?

Mental Model: The Path Through a Forest

Imagine text generation as walking through a dense forest where every step forward reveals multiple possible paths:

  • Starting point: Your prompt (the trailhead)
  • Each step: Choosing the next word
  • Path options: All possible next words, each with a "difficulty score" (inverse of probability)
  • Goal: Reach a meaningful destination (complete text)

Different generation strategies represent different hiking philosophies:

  • Greedy: Always take the easiest visible path
  • Beam Search: Scout ahead on multiple promising paths, then choose the best overall route

Greedy Search: The Simplest Approach

Core Concept

Greedy search always chooses the most probable next token. It's called "greedy" because it makes locally optimal choices without considering future consequences.

Algorithm in Plain English:

  1. Look at all possible next words
  2. Pick the one with the highest probability
  3. Add it to your text
  4. Repeat until you decide to stop

Visualization: Greedy Path Selection

Decoding Path Visualization

Prompt:
Language models can be used to
Generated Sequence:
Language models can be used to the in he can beneficial
Decoding Path:
Position 1:
the
50.0%
is
16.9%
in
11.3%
a
7.1%
was
5.9%
Position 2:
in
60.0%
to
32.9%
and
4.9%
that
1.0%
she
0.3%
Position 3:
he
70.0%
she
24.4%
we
3.1%
will
1.4%
they
1.2%
Position 4:
can
80.0%
should
8.5%
might
7.9%
useful
2.1%
important
1.0%
Position 5:
beneficial
90.0%
crucial
9.6%
helpful
0.1%
necessary
0.1%
essential
0.0%

The decoding path shows token probabilities at each step and the selected token (highlighted). This visualization demonstrates how the model makes decisions during text generation.

Python Implementation

python
def greedy_search(model, tokenizer, prompt, max_length=50): """ Generate text using greedy search (always pick most likely token). Args: model: The language model tokenizer: Tokenizer for encoding/decoding prompt: Starting text max_length: Maximum tokens to generate """

When Greedy Search Works Well

✅ Excellent for:

  • Factual question answering: "What is the capital of France?" → "Paris"
  • Short completions: Where there's usually one best answer
  • Translation of common phrases: Well-established translations
  • Code completion: Where syntax correctness matters most

Example outputs where greedy excels:

Input: "The chemical symbol for water is" Greedy: "H2O, consisting of two hydrogen atoms and one oxygen atom." ✅ Perfect! Factual and correct. Input: "To install numpy, run" Greedy: "pip install numpy" ✅ Exactly what we want!

When Greedy Search Fails

❌ Problems arise with:

  • Creative writing: Produces boring, predictable text
  • Long text generation: Gets stuck in repetitive loops
  • Open-ended questions: Gives generic responses

Example where greedy fails:

Input: "Once upon a time, in a magical forest" Greedy: "there was a little girl who was walking through the forest. She was walking through the forest when she saw a little girl walking through the forest..." ❌ Repetitive and boring!

The Repetition Problem

Greedy search often gets trapped in loops because:

  1. Common words have high probability
  2. Once generated, they increase probability of similar patterns
  3. No mechanism to avoid repetition

Demonstration:

python
# This often happens with greedy search prompt = "I think that" # Greedy might produce: "I think that I think that I think that..."

Beam Search: Exploring Multiple Paths

Core Concept

Beam search improves on greedy search by considering multiple possible sequences simultaneously. Instead of committing to one path, it keeps track of the top-K most promising sequences (called "beams").

Key Innovation: Look ahead before making final decisions.

The Beam Width Parameter

  • Beam width = 1: Equivalent to greedy search
  • Beam width = 3: Keep track of 3 best sequences
  • Beam width = 5: Keep track of 5 best sequences
  • Higher beam width: More exploration, more computation

Visualization: Beam Search Tree

Beam Search Visualization

Prompt:
Natural language processing has
Beam Search Parameters:
Beam Width: 3
Max Depth: 3
Beam Search Tree:
Beam 1
Score: 1.000
Natural language processing has the was at
Beam 2
Score: 0.900
Natural language processing has the was this
Beam 3
Score: 0.900
Natural language processing has the on at

Beam search maintains multiple candidate sequences (beams) at each step, choosing the most likely continuations based on cumulative probability.

Algorithm Walkthrough

Let's trace through beam search step by step:

Step 1: Start with prompt

Prompt: "The future of AI" Beams: ["The future of AI"]

Step 2: Generate first token

Top candidates: "is", "will", "depends" Beams: ["The future of AI is", "The future of AI will", "The future of AI depends"]

Step 3: Generate second token (from each beam)

From "...is": "bright", "uncertain", "promising" From "...will": "be", "depend", "involve" From "...depends": "on", "heavily", "largely" Keep top 3 overall: 1. "The future of AI is bright" 2. "The future of AI will be" 3. "The future of AI depends on"

Python Implementation

python
def beam_search(model, tokenizer, prompt, beam_width=5, max_length=50): """ Generate text using beam search. Args: beam_width: Number of sequences to track simultaneously """ # Encode the starting prompt input_ids = tokenizer.encode(prompt, return_tensors="pt")

Beam Search Advantages

✅ Better than greedy because:

  • Avoids local optima: Considers multiple paths before deciding
  • Higher quality output: Often more coherent and fluent
  • Still deterministic: Same input always gives same output
  • Configurable exploration: Adjust beam width for your needs

Comparison example:

Prompt: "The solution to climate change requires" Greedy: "a global effort to reduce carbon emissions." Beam (width=5): "coordinated international cooperation, technological innovation, and fundamental changes in how we produce and consume energy."

Beam Search Limitations

❌ Still has issues:

  • Computational cost: 5x more expensive than greedy (with beam width 5)
  • Generic output: Tends toward "safe" but boring completions
  • Length bias: Favors shorter sequences (more tokens = lower probability)
  • Still deterministic: No creativity or surprise

Beam Width Selection Guide

Task TypeRecommended Beam WidthReasoning
Translation4-6Balance quality and computation
Summarization3-5Want coherent but not too generic
Question Answering1-3Usually one correct answer
Creative WritingNot recommendedToo conservative
Code Generation2-4Syntax correctness important

Direct Comparison: Greedy vs. Beam Search

Side-by-Side Examples

Sampling Methods Comparison

Prompt:
The most interesting aspect of machine learning is
Greedy
Parameters: default
The most interesting aspect of machine learning is absolutely to consider how obviously the approach is.
The most interesting aspect of machine learning is obviously to consider how absolutely the approach is.
Beam
Parameters: num_beams=5
The most interesting aspect of machine learning is important significant when applied correctly.
The most interesting aspect of machine learning is beneficial to consider how significant the approach is.
Temperature
Parameters: temperature=0.7
The most interesting aspect of machine learning is interesting exciting although its implications.
The most interesting aspect of machine learning is surprising interesting when applied correctly.
Top-K
Parameters: k=50, temperature=1
The most interesting aspect of machine learning is perhaps potentially when applied correctly.
The most interesting aspect of machine learning is perhaps conceivably considering its implications.
Nucleus
Parameters: p=0.9, temperature=1
The most interesting aspect of machine learning is creative creative when applied correctly.
The most interesting aspect of machine learning is groundbreaking to consider how groundbreaking the approach is.
Combined
Parameters: p=0.9, k=50, temperature=0.7
The most interesting aspect of machine learning is intriguing captivating when its implications.
The most interesting aspect of machine learning is captivating compelling when applied correctly.
Method Characteristics:
Greedy: Deterministic, fluent but limited diversity
Beam Search: More comprehensive exploration, still deterministic
Temperature: Controls randomness, higher = more diverse
Top-K: Prevents low-probability selections
Nucleus: Adaptively selects token pool
Combined: Balanced quality and diversity

Different methods produce different outputs from the same prompt. The optimal sampling strategy depends on your specific application and requirements for creativity vs. predictability.

Quality Metrics Comparison

AspectGreedy SearchBeam Search (width=5)
FluencyGoodVery Good
CoherenceGoodVery Good
CreativityPoorPoor
ConsistencyHighHigh
SpeedFast5x Slower
Memory UsageLow5x Higher

When to Choose Each Method

Choose Greedy Search when:

  • Speed is critical
  • Memory is limited
  • Task has clear "correct" answers
  • Generating short completions
  • Prototyping or debugging

Choose Beam Search when:

  • Quality is more important than speed
  • Task benefits from planning ahead (translation, summarization)
  • You have computational resources
  • Output length is moderate (20-100 tokens)
  • Serving fewer requests but need higher quality

Practical Implementation with Hugging Face

Modern libraries make these methods easy to use:

python
from transformers import pipeline, GPT2LMHeadModel, GPT2Tokenizer # Setup model_name = "gpt2" model = GPT2LMHeadModel.from_pretrained(model_name) tokenizer = GPT2Tokenizer.from_pretrained(model_name) generator = pipeline('text-generation', model=model, tokenizer=tokenizer) # Greedy search (do_sample=False, num_beams=1) greedy_output = generator(

Key Parameters Explained

  • do_sample=False: Use deterministic methods (greedy/beam)
  • num_beams=1: Greedy search
  • num_beams>1: Beam search with specified width
  • early_stopping=True: Stop when EOS token is generated
  • num_return_sequences: How many different outputs to return

Common Issues and Debugging

Problem 1: Repetitive Output

python
# Symptoms: "I think that I think that I think..." # Solutions: - Try beam search instead of greedy - Add length constraints - Use repetition penalty (covered in next lesson)

Problem 2: Truncated Output

python
# Symptoms: Output stops too early # Solutions: - Increase max_length parameter - Check for unexpected EOS tokens - Verify tokenizer settings

Problem 3: Generic Output

python
# Symptoms: Boring, predictable text # Solutions: - This is expected with deterministic methods - Consider probabilistic sampling (next lesson) - Adjust beam width

Problem 4: High Memory Usage

python
# Symptoms: Out of memory errors with beam search # Solutions: - Reduce beam width - Process shorter sequences - Use gradient checkpointing - Switch to greedy for prototyping

Summary and Next Steps

What We've Learned

  1. Greedy Search: Simple, fast, but can get stuck in loops
  2. Beam Search: Better quality through exploration, but more expensive
  3. Trade-offs: Speed vs. quality, determinism vs. creativity
  4. Use Cases: When to choose each method

The Limitation of Deterministic Methods

Both greedy and beam search share a fundamental limitation: they're too conservative. They always choose the "safe" options, leading to:

  • Predictable, sometimes boring text
  • Lack of creativity and surprise
  • Difficulty with open-ended creative tasks

What's Next

In our next lesson, we'll explore probabilistic sampling techniques that introduce controlled randomness:

  • Temperature sampling: Adding creativity while maintaining quality
  • Top-k sampling: Limiting choices to reasonable options
  • Nucleus (top-p) sampling: Dynamically adjusting choice sets
  • Combining techniques: Building production-ready systems

These methods will give us the tools to generate more interesting, creative, and diverse text while still maintaining quality and coherence.

Practice Exercises

Exercise 1: Implementation Challenge

Implement both greedy and beam search from scratch (without using Hugging Face's built-in methods). Compare your results with the library versions.

Exercise 2: Parameter Exploration

Test beam search with different beam widths (1, 3, 5, 10) on the same prompt. Analyze how beam width affects:

  • Output quality
  • Generation time
  • Memory usage

Exercise 3: Use Case Analysis

For each scenario below, decide whether to use greedy search or beam search and justify your choice:

  1. Real-time chatbot responses
  2. Academic paper summarization
  3. Code completion in an IDE
  4. News article generation
  5. Poetry writing assistant

Exercise 4: Debugging Practice

Given these problematic outputs, identify the likely issue and propose solutions:

  1. "The the the the the the..."
  2. Output that stops after just 3 tokens
  3. Very generic, dictionary-like responses
  4. Out of memory errors

Additional Resources

Common Issues and Solutions

Repetition Problems:

  • Use repetition penalty
  • Consider probabilistic sampling techniques (covered in our next lesson)

Quality vs. Speed Trade-offs:

  • Beam search: Better quality, slower
  • Greedy search: Faster, potentially lower quality
  • Consider probabilistic sampling for creative applications (next lesson)

When to Use Deterministic Methods

Ideal for:

  • Factual question answering where accuracy is paramount
  • Machine translation where consistency matters
  • Summarization tasks requiring faithful representation
  • Any application where deterministic, reproducible outputs are required

Less ideal for:

  • Creative writing where diversity and surprise are valued
  • Conversational AI where natural variation is important
  • Brainstorming applications requiring multiple diverse ideas

In our next lesson, we'll explore probabilistic sampling techniques that introduce controlled randomness: