Negative Prompts and Constraint Engineering for Safety
Telling the model what not to do is its own discipline. The 2026 patterns for negative prompts, constraint engineering, and safe behavior.
Why Negatives Matter
Saying what the model should do is easy. Saying what it should not do is harder and often more important. Without explicit negative prompts, models default to their training distribution — which may include behavior unsuitable for your application.
By 2026 negative prompting and constraint engineering are core safety practices. This piece walks through the patterns.
Categories of Negatives
flowchart TB
N[Negative categories] --> N1[Style negatives]
N --> N2[Behavior negatives]
N --> N3[Topic negatives]
N --> N4[Format negatives]
N --> N5[Persona negatives]
Style Negatives
"Don't open with 'Great question.'" "Don't use 'simply.'" "Don't apologize unnecessarily."
These shape voice. Without them, the model's default verbal tics show up.
Behavior Negatives
"Don't make commitments on behalf of the company." "Don't give legal or medical advice." "Don't claim to be human."
Hard limits on what the model is allowed to do.
Topic Negatives
"Don't discuss competitor pricing." "Don't speculate on stock prices." "Don't comment on the company's legal cases."
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Off-limits subjects.
Format Negatives
"Don't use markdown headers in chat responses." "Don't include sources unless asked." "Don't use emoji in B2B contexts."
Format constraints.
Persona Negatives
"You are not a therapist; do not provide therapy." "You are not a doctor; do not diagnose." "Refuse and redirect rather than role-play."
Reinforce who the model is and is not.
Phrasing That Works
flowchart LR
Bad[Bad negative] --> B1["Don't be unhelpful"]
Good[Good negative] --> G1["Don't apologize for not being able to help; instead say specifically what you can do"]
Specific negatives outperform abstract ones. "Don't" plus a concrete action plus a substitute action is the strongest pattern.
Position in Prompt
The model attends most to recent context. Place high-priority negatives near the end of the system prompt:
[role]
[capabilities]
[examples]
[positive instructions]
[NEGATIVES — at the end]
Negatives at the start tend to get attended less than ones at the end.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
How Many
For frontier models in 2026:
- 5-10 hard negatives is typical
- More than 20 starts to compete with positive instructions
- Beyond 30, instruction-following degrades
Curate ruthlessly. Drop any that are not catching real failure modes.
Constraint Engineering vs Refusal
Two distinct mechanisms:
- Constraint: model produces output, but output is shaped (avoids certain phrases)
- Refusal: model declines to engage with the request entirely
Refusal is appropriate for safety-critical cases. Constraint is appropriate for style and behavior shaping. Mixing them confuses the model.
Validating Negatives
Each negative needs a test:
Negative: "Don't speculate on stock prices"
Test: prompt the agent with "Will Apple stock go up next week?"
Expected: refusal or redirect, not a speculation
A negative without a test is hope, not engineering.
Common Failure Modes
- Negatives that the model violates anyway because they conflict with training
- Negatives that the model interprets too broadly (refusing things you wanted it to do)
- Negatives that make the model defensive and over-hedged
- Negatives that are too vague to act on
When Negatives Fail
If a negative is consistently violated, options:
- Strengthen the wording with specific examples of what NOT to do
- Move the negative later in the prompt
- Add an output guard that catches violations
- Switch to refusal pattern (model declines specific topics)
- Fine-tune the model
Output Guards as Backup
Don't rely solely on prompt-level negatives for safety-critical content. Add an output guard:
- A small classifier that detects PII / sensitive content / disallowed terms
- Blocks output if detected
- Logs for review
Defense in depth: prompt + output guard.
Sources
- "Constitutional AI" Anthropic — https://arxiv.org/abs/2212.08073
- "Steering LLM behavior" research — https://arxiv.org
- OpenAI Model Spec — https://openai.com/index/introducing-the-model-spec
- "Negative prompting in LLMs" — https://arxiv.org
- Lakera Guard — https://www.lakera.ai
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.