mantus.ai

YOUR NEIGHBOUR UNDERSTANDS AI. DO YOU?

How do I configure AI models for consistent results?

Master temperature, top-K, and top-P settings to control creativity versus accuracy, and learn when to use different configurations for different tasks.

The difference between a random AI response and a useful one often comes down to three numbers: temperature, top-K, and top-P. These settings control how your AI model thinks—whether it gives you the same answer every time or surprises you with creative alternatives.

Think of an AI model like a prediction engine. When you ask it something, it doesn't just spit out one answer. It calculates probabilities for thousands of possible next words, then uses these three settings to pick which word actually appears. Get these settings right, and your AI becomes reliable. Get them wrong, and it either repeats itself endlessly or rambles into nonsense.

Temperature: The creativity dial

Temperature controls randomness. At 0, the model always picks the highest probability word—completely predictable. At higher values, it starts choosing less likely options, adding creativity but also unpredictability.

For factual tasks like summarizing documents or extracting data, use low temperatures around 0.1. The model focuses on accuracy over originality. For creative work like brainstorming or writing fiction, try temperatures around 0.7 to 0.9.

Temperature extremes break things. Set it to 0 and the other settings become irrelevant—the model picks the most likely word every time. Crank it above 1 and the model treats all words as equally likely, producing incoherent gibberish.

Top-K and top-P: Filtering options

Before temperature even kicks in, top-K and top-P filter which words the model considers. They work like bouncers at a club, deciding who gets in.

Top-K keeps only the K most likely words. Set top-K to 20, and the model only considers the 20 most probable next words, ignoring everything else. Higher top-K values give more creative freedom, lower values enforce more predictable responses.

Top-P (nucleus sampling) takes a different approach. Instead of a fixed number of words, it includes words until their combined probability reaches P. With top-P at 0.9, the model considers however many words it takes to reach 90% total probability.

The difference matters. Top-K might include 20 words even if the top 3 words already have 95% probability. Top-P would stop at those 3 words, avoiding the noise of unlikely options.

How they work together

These settings combine in sequence. First, top-K and top-P filter the candidate words—only words that pass both filters make the cut. Then temperature determines how to choose between the survivors.

This interaction creates some counterintuitive effects. If you set top-K to 1, temperature and top-P become meaningless—only one word survives the filter. Set top-P to 0, and most models treat only the highest probability word as valid.

Extreme settings in any direction can override the others. Understanding this hierarchy prevents wasted effort tweaking settings that aren't actually affecting your output.

Configuration patterns that work

Start with these proven combinations:

For consistent, factual responses: temperature 0.1, top-P 0.9, top-K 20. This setup gives you reliable answers with minimal randomness.

For balanced creativity: temperature 0.2, top-P 0.95, top-K 30. Good for tasks requiring some originality without going off the rails.

For maximum creativity: temperature 0.9, top-P 0.99, top-K 40. Use this for brainstorming, fiction writing, or exploring unusual angles.

For tasks with single correct answers: temperature 0, ignore the other settings. Perfect for math problems, code debugging, or structured data extraction.

The repetition loop trap

Poor configuration creates a common problem: repetition loops. The model gets stuck repeating the same phrase or word until it hits the output limit. This happens at both temperature extremes—too low and the model becomes rigidly deterministic, too high and random selections accidentally circle back to previous text.

Watch for responses that end with repeated filler words or circular logic. Adjust temperature first, then fine tune top-K and top-P to find the sweet spot between creativity and coherence.

These aren't just technical knobs to twist. They fundamentally change how the AI reasons through problems. Low settings make it conservative and predictable. High settings make it exploratory and unpredictable. Match the configuration to your task's needs, not to what sounds most impressive.