Text Generation: Deterministic Methods

Overview

When language models predict the next token, they output probability distributions over the entire vocabulary. But how do we convert these probabilities into actual text? This lesson explores deterministic generation methods—approaches that make predictable, reproducible choices about which tokens to select from transformer-based language models.

Understanding these foundational methods is crucial before we explore more creative sampling techniques. Think of this as learning to walk before we run: mastering predictable text generation gives us the foundation to understand when and why we might want to introduce randomness in language model outputs.

Learning Objectives

After completing this lesson, you will be able to:

Understand the fundamental challenge of converting probabilities to text
Implement and explain greedy search generation
Implement and explain beam search generation
Compare the trade-offs between different deterministic approaches
Choose the right method for specific use cases
Debug common issues in deterministic generation

The Core Challenge: From Probabilities to Words

The Decision Point

Every time a language model generates text, it faces the same fundamental challenge at each step:

Model output: [0.4, 0.3, 0.15, 0.1, 0.05, ...]
Vocabulary:   ["the", "a", "an", "this", "that", ...]
Decision:     Which word do we actually choose?

Mental Model: The Path Through a Forest

Imagine text generation as walking through a dense forest where every step forward reveals multiple possible paths:

Starting point: Your prompt (the trailhead)
Each step: Choosing the next word
Path options: All possible next words, each with a "difficulty score" (inverse of probability)
Goal: Reach a meaningful destination (complete text)

Different generation strategies represent different hiking philosophies:

Greedy: Always take the easiest visible path
Beam Search: Scout ahead on multiple promising paths, then choose the best overall route

Greedy Search: The Simplest Approach

Core Concept

Greedy search always chooses the most probable next token. It's called "greedy" because it makes locally optimal choices without considering future consequences.

Algorithm in Plain English:

Look at all possible next words
Pick the one with the highest probability
Add it to your text
Repeat until you decide to stop

Visualization: Greedy Path Selection

Decoding Path Visualization

Prompt:

Language models can be used to

Generated Sequence:

Language models can be used to the in he can beneficial

Decoding Path:

Position 1:

→the

50.0%

16.9%

11.3%

7.1%

was

5.9%

Position 2:

→in

60.0%

32.9%

and

4.9%

that

1.0%

she

0.3%

Position 3:

→he

70.0%

she

24.4%

3.1%

will

1.4%

they

1.2%

Position 4:

→can

80.0%

should

8.5%

might

7.9%

useful

2.1%

important

1.0%

Position 5:

→beneficial

90.0%

crucial

9.6%

helpful

0.1%

necessary

0.1%

essential

0.0%

The decoding path shows token probabilities at each step and the selected token (highlighted). This visualization demonstrates how the model makes decisions during text generation.

Python Implementation

python
def greedy_search(model, tokenizer, prompt, max_length=50):
    """
    Generate text using greedy search (always pick most likely token).
    
    Args:
        model: The language model
        tokenizer: Tokenizer for encoding/decoding
        prompt: Starting text
        max_length: Maximum tokens to generate
    """

When Greedy Search Works Well

✅ Excellent for:

Factual question answering: "What is the capital of France?" → "Paris"
Short completions: Where there's usually one best answer
Translation of common phrases: Well-established translations
Code completion: Where syntax correctness matters most

Example outputs where greedy excels:

Input: "The chemical symbol for water is"
Greedy: "H2O, consisting of two hydrogen atoms and one oxygen atom."
✅ Perfect! Factual and correct.

Input: "To install numpy, run"
Greedy: "pip install numpy"
✅ Exactly what we want!

When Greedy Search Fails

❌ Problems arise with:

Creative writing: Produces boring, predictable text
Long text generation: Gets stuck in repetitive loops
Open-ended questions: Gives generic responses

Example where greedy fails:

Input: "Once upon a time, in a magical forest"
Greedy: "there was a little girl who was walking through the forest. She was walking through the forest when she saw a little girl walking through the forest..."
❌ Repetitive and boring!

The Repetition Problem

Greedy search often gets trapped in loops because:

Common words have high probability
Once generated, they increase probability of similar patterns
No mechanism to avoid repetition

Demonstration:

python
# This often happens with greedy search
prompt = "I think that"
# Greedy might produce: "I think that I think that I think that..."

Beam Search: Exploring Multiple Paths

Core Concept

Beam search improves on greedy search by considering multiple possible sequences simultaneously. Instead of committing to one path, it keeps track of the top-K most promising sequences (called "beams").

Key Innovation: Look ahead before making final decisions.

The Beam Width Parameter

Beam width = 1: Equivalent to greedy search
Beam width = 3: Keep track of 3 best sequences
Beam width = 5: Keep track of 5 best sequences
Higher beam width: More exploration, more computation

Visualization: Beam Search Tree

Beam Search Visualization

Prompt:

Natural language processing has

Beam Search Parameters:

Beam Width: 3

Max Depth: 3

Beam Search Tree:

Beam 1

Score: 1.000

Natural language processing has the was at

Beam 2

Score: 0.900

Natural language processing has the was this

Beam 3

Score: 0.900

Natural language processing has the on at

Beam search maintains multiple candidate sequences (beams) at each step, choosing the most likely continuations based on cumulative probability.

Algorithm Walkthrough

Let's trace through beam search step by step:

Step 1: Start with prompt

Prompt: "The future of AI"
Beams: ["The future of AI"]

Step 2: Generate first token

Top candidates: "is", "will", "depends"
Beams: ["The future of AI is", "The future of AI will", "The future of AI depends"]

Step 3: Generate second token (from each beam)

From "...is": "bright", "uncertain", "promising"
From "...will": "be", "depend", "involve"  
From "...depends": "on", "heavily", "largely"

Keep top 3 overall:
1. "The future of AI is bright"
2. "The future of AI will be"
3. "The future of AI depends on"

Python Implementation

python
def beam_search(model, tokenizer, prompt, beam_width=5, max_length=50):
    """
    Generate text using beam search.
    
    Args:
        beam_width: Number of sequences to track simultaneously
    """
    # Encode the starting prompt
    input_ids = tokenizer.encode(prompt, return_tensors="pt")

Beam Search Advantages

✅ Better than greedy because:

Avoids local optima: Considers multiple paths before deciding
Higher quality output: Often more coherent and fluent
Still deterministic: Same input always gives same output
Configurable exploration: Adjust beam width for your needs

Comparison example:

Prompt: "The solution to climate change requires"

Greedy: "a global effort to reduce carbon emissions."
Beam (width=5): "coordinated international cooperation, technological innovation, and fundamental changes in how we produce and consume energy."

Beam Search Limitations

❌ Still has issues:

Computational cost: 5x more expensive than greedy (with beam width 5)
Generic output: Tends toward "safe" but boring completions
Length bias: Favors shorter sequences (more tokens = lower probability)
Still deterministic: No creativity or surprise

Beam Width Selection Guide

Task Type	Recommended Beam Width	Reasoning
Translation	4-6	Balance quality and computation
Summarization	3-5	Want coherent but not too generic
Question Answering	1-3	Usually one correct answer
Creative Writing	Not recommended	Too conservative
Code Generation	2-4	Syntax correctness important

Direct Comparison: Greedy vs. Beam Search

Side-by-Side Examples

Sampling Methods Comparison

Prompt:

The most interesting aspect of machine learning is

Greedy

Parameters: default

The most interesting aspect of machine learning is absolutely to consider how obviously the approach is.

The most interesting aspect of machine learning is obviously to consider how absolutely the approach is.

Beam

Parameters: num_beams=5

The most interesting aspect of machine learning is important significant when applied correctly.

The most interesting aspect of machine learning is beneficial to consider how significant the approach is.

Temperature

Parameters: temperature=0.7

The most interesting aspect of machine learning is interesting exciting although its implications.

The most interesting aspect of machine learning is surprising interesting when applied correctly.

Top-K

Parameters: k=50, temperature=1

The most interesting aspect of machine learning is perhaps potentially when applied correctly.

The most interesting aspect of machine learning is perhaps conceivably considering its implications.

Nucleus

Parameters: p=0.9, temperature=1

The most interesting aspect of machine learning is creative creative when applied correctly.

The most interesting aspect of machine learning is groundbreaking to consider how groundbreaking the approach is.

Combined

Parameters: p=0.9, k=50, temperature=0.7

The most interesting aspect of machine learning is intriguing captivating when its implications.

The most interesting aspect of machine learning is captivating compelling when applied correctly.

Method Characteristics:

Greedy: Deterministic, fluent but limited diversity

Beam Search: More comprehensive exploration, still deterministic

Temperature: Controls randomness, higher = more diverse

Top-K: Prevents low-probability selections

Nucleus: Adaptively selects token pool

Combined: Balanced quality and diversity

Different methods produce different outputs from the same prompt. The optimal sampling strategy depends on your specific application and requirements for creativity vs. predictability.

Quality Metrics Comparison

Aspect	Greedy Search	Beam Search (width=5)
Fluency	Good	Very Good
Coherence	Good	Very Good
Creativity	Poor	Poor
Consistency	High	High
Speed	Fast	5x Slower
Memory Usage	Low	5x Higher

When to Choose Each Method

Choose Greedy Search when:

Speed is critical
Memory is limited
Task has clear "correct" answers
Generating short completions
Prototyping or debugging

Choose Beam Search when:

Quality is more important than speed
Task benefits from planning ahead (translation, summarization)
You have computational resources
Output length is moderate (20-100 tokens)
Serving fewer requests but need higher quality

Practical Implementation with Hugging Face

Modern libraries make these methods easy to use:

python
from transformers import pipeline, GPT2LMHeadModel, GPT2Tokenizer

# Setup
model_name = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
generator = pipeline('text-generation', model=model, tokenizer=tokenizer)

# Greedy search (do_sample=False, num_beams=1)
greedy_output = generator(

Key Parameters Explained

do_sample=False: Use deterministic methods (greedy/beam)
num_beams=1: Greedy search
num_beams>1: Beam search with specified width
early_stopping=True: Stop when EOS token is generated
num_return_sequences: How many different outputs to return

Common Issues and Debugging

Problem 1: Repetitive Output

python
# Symptoms: "I think that I think that I think..."
# Solutions:
- Try beam search instead of greedy
- Add length constraints
- Use repetition penalty (covered in next lesson)

Problem 2: Truncated Output

python
# Symptoms: Output stops too early
# Solutions:
- Increase max_length parameter
- Check for unexpected EOS tokens
- Verify tokenizer settings

Problem 3: Generic Output

python
# Symptoms: Boring, predictable text
# Solutions:
- This is expected with deterministic methods
- Consider probabilistic sampling (next lesson)
- Adjust beam width

Problem 4: High Memory Usage

python
# Symptoms: Out of memory errors with beam search
# Solutions:
- Reduce beam width
- Process shorter sequences
- Use gradient checkpointing
- Switch to greedy for prototyping

Summary and Next Steps

What We've Learned

Greedy Search: Simple, fast, but can get stuck in loops
Beam Search: Better quality through exploration, but more expensive
Trade-offs: Speed vs. quality, determinism vs. creativity
Use Cases: When to choose each method

The Limitation of Deterministic Methods

Both greedy and beam search share a fundamental limitation: they're too conservative. They always choose the "safe" options, leading to:

Predictable, sometimes boring text
Lack of creativity and surprise
Difficulty with open-ended creative tasks

What's Next

In our next lesson, we'll explore probabilistic sampling techniques that introduce controlled randomness:

Temperature sampling: Adding creativity while maintaining quality
Top-k sampling: Limiting choices to reasonable options
Nucleus (top-p) sampling: Dynamically adjusting choice sets
Combining techniques: Building production-ready systems

These methods will give us the tools to generate more interesting, creative, and diverse text while still maintaining quality and coherence.

Practice Exercises

Exercise 1: Implementation Challenge

Implement both greedy and beam search from scratch (without using Hugging Face's built-in methods). Compare your results with the library versions.

Exercise 2: Parameter Exploration

Test beam search with different beam widths (1, 3, 5, 10) on the same prompt. Analyze how beam width affects:

Output quality
Generation time
Memory usage

Exercise 3: Use Case Analysis

For each scenario below, decide whether to use greedy search or beam search and justify your choice:

Real-time chatbot responses
Academic paper summarization
Code completion in an IDE
News article generation
Poetry writing assistant

Exercise 4: Debugging Practice

Given these problematic outputs, identify the likely issue and propose solutions:

"The the the the the the..."
Output that stops after just 3 tokens
Very generic, dictionary-like responses
Out of memory errors

Additional Resources

Hugging Face Text Generation Guide
The Illustrated GPT-2 - Great visualization of the generation process
Beam Search Implementation Tutorial
Google's Neural Machine Translation System - Where beam search shines

Common Issues and Solutions

Repetition Problems:

Use repetition penalty
Consider probabilistic sampling techniques (covered in our next lesson)

Quality vs. Speed Trade-offs:

Beam search: Better quality, slower
Greedy search: Faster, potentially lower quality
Consider probabilistic sampling for creative applications (next lesson)

When to Use Deterministic Methods

Ideal for:

Factual question answering where accuracy is paramount
Machine translation where consistency matters
Summarization tasks requiring faithful representation
Any application where deterministic, reproducible outputs are required

Less ideal for:

Creative writing where diversity and surprise are valued
Conversational AI where natural variation is important
Brainstorming applications requiring multiple diverse ideas

In our next lesson, we'll explore probabilistic sampling techniques that introduce controlled randomness:

NLP Fundamentals: Core Concepts and Architectures

Text Generation: Deterministic Methods

Overview

Learning Objectives

The Core Challenge: From Probabilities to Words

The Decision Point

Mental Model: The Path Through a Forest

Greedy Search: The Simplest Approach

Core Concept

Visualization: Greedy Path Selection

Decoding Path Visualization

Python Implementation

When Greedy Search Works Well

When Greedy Search Fails

The Repetition Problem

Beam Search: Exploring Multiple Paths

Core Concept

The Beam Width Parameter

Visualization: Beam Search Tree

Beam Search Visualization

Algorithm Walkthrough

Python Implementation

Beam Search Advantages

Beam Search Limitations

Beam Width Selection Guide

Direct Comparison: Greedy vs. Beam Search

Side-by-Side Examples

Sampling Methods Comparison

Quality Metrics Comparison

When to Choose Each Method

Practical Implementation with Hugging Face

Key Parameters Explained

Common Issues and Debugging

Problem 1: Repetitive Output

Problem 2: Truncated Output

Problem 3: Generic Output

Problem 4: High Memory Usage

Summary and Next Steps

What We've Learned

The Limitation of Deterministic Methods

What's Next

Practice Exercises

Exercise 1: Implementation Challenge

Exercise 2: Parameter Exploration

Exercise 3: Use Case Analysis

Exercise 4: Debugging Practice

Additional Resources

Common Issues and Solutions

When to Use Deterministic Methods