AI / Claude Models Basics Interview Questions
1. What is Claude and who makes it?
Claude is a family of state-of-the-art large language models (LLMs) built by Anthropic, an AI safety company founded in 2021. Claude excels at language, reasoning, analysis, coding, mathematics, and creative writing, and is designed with a strong focus on being helpful, harmless, and honest.
Anthropic makes Claude available in two main ways:
- claude.ai — a consumer and business chat interface (Free, Pro, Team, Enterprise, and Max plans)
- Claude API — a developer API for building applications, also available through Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry
Claude is trained using a technique called Constitutional AI (CAI), which guides the model's values and behaviour using a set of principles rather than relying purely on human feedback for every response. This approach is central to Anthropic's mission of building AI that is safe and beneficial.
2. What are the current Claude model families and what is each one optimised for?
Claude models are organised into three tiers — Opus, Sonnet, and Haiku — representing a capability-speed-cost spectrum. As of mid-2026 the current flagship generation is Claude 4/5, alongside the newly released Claude Fable 5.
| Tier | Optimised for | Example model |
|---|---|---|
| Opus | Maximum capability — complex reasoning, coding, agentic tasks, enterprise work | Claude Opus 4.8 |
| Sonnet | Best balance of intelligence and speed — coding, agents, enterprise workflows | Claude Sonnet 5 |
| Haiku | Fastest responses with near-frontier intelligence — high-throughput, latency-sensitive tasks | Claude Haiku 4.5 |
Additionally, Claude Fable 5 sits above the Opus tier as Anthropic's most capable widely-released model, described as providing next-generation intelligence for long-running agents.
Claude Mythos 5 shares Fable 5's specifications but is offered only through the invitation-only Project Glasswing programme for defensive cybersecurity workflows.
3. What are the API model IDs for the current Claude models?
When making API calls you must specify the exact model ID string. Model IDs starting with the Claude 4.6 generation use a dateless format that is still a pinned snapshot (not an evergreen pointer). Earlier models used a dated format like claude-haiku-4-5-20251001.
| Model | API ID | Alias |
|---|---|---|
| Claude Fable 5 | claude-fable-5 | claude-fable-5 |
| Claude Opus 4.8 | claude-opus-4-8 | claude-opus-4-8 |
| Claude Sonnet 5 | claude-sonnet-5 | claude-sonnet-5 |
| Claude Haiku 4.5 | claude-haiku-4-5-20251001 | claude-haiku-4-5 |
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-8", # exact API model ID
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain what a context window is."}
]
)
print(message.content[0].text)The Bedrock IDs follow a different pattern (e.g. anthropic.claude-opus-4-83) and should be used when accessing Claude through Amazon Bedrock. Google Cloud IDs match the Claude API IDs but may append a regional variant.
4. What is a context window and what are the context window sizes for current Claude models?
A context window is the total number of tokens (words, punctuation, code, etc.) that a model can process in a single request — encompassing the system prompt, all conversation history, tool definitions, and the model's own output. If you exceed the context window, older content must be removed or the request will fail.
| Model | Context window | Max output tokens |
|---|---|---|
| Claude Fable 5 | 1 million tokens | 128,000 tokens |
| Claude Opus 4.8 | 1 million tokens | 128,000 tokens |
| Claude Sonnet 5 | 1 million tokens | 128,000 tokens |
| Claude Haiku 4.5 | 200,000 tokens | 64,000 tokens |
Key facts:
- The 1M token context window is the default for Fable 5, Opus 4.8, and Sonnet 5 — no beta header is needed and it is billed at standard pricing
- Claude Haiku 4.5 has a smaller 200k token window and 64k max output
- On the Message Batches API, Opus 4.8, Opus 4.7, Opus 4.6, Sonnet 5, and Sonnet 4.6 support up to 300k output tokens using the
output-300k-2026-03-24beta header - A single request can include up to 600 images/PDF pages (100 for 200k models)
5. What are the pricing tiers for current Claude models and how is pricing calculated?
Claude API pricing is charged per million tokens (MTok) — counting both input tokens (your prompt, system prompt, conversation history) and output tokens (Claude's response). Prices differ by model and reflect the capability-cost trade-off.
| Model | Input (per MTok) | Output (per MTok) |
|---|---|---|
| Claude Fable 5 | $10 | $50 |
| Claude Opus 4.8 | $5 | $25 |
| Claude Sonnet 5 | $3 (intro: $2 until Aug 31 2026) | $15 (intro: $10) |
| Claude Haiku 4.5 | $1 | $5 |
Cost-saving features:
- Prompt caching — reuse of cached prompt prefixes is billed at a significant discount (cache writes are more expensive; cache reads are cheaper than standard input)
- Message Batches API — async batch processing at roughly 50% of standard pricing, ideal for large-scale, non-real-time workloads
Cloud platform pricing (Amazon Bedrock, Google Cloud) may differ from direct API pricing. See the Pricing page for full details including per-region variations.
6. What input and output modalities do current Claude models support?
All current Claude models share a common set of supported modalities for input and output, with no difference between Opus, Sonnet, and Haiku tiers on core capabilities.
| Capability | Supported? | Notes |
|---|---|---|
| Text input | Yes | All models |
| Image input (vision) | Yes | All models — up to 600 images per request (100 for 200k models) |
| PDF input | Yes | Treated similarly to images for token budgeting |
| Text output | Yes | All models |
| Multilingual | Yes | Strong performance across major languages |
| Tool use / function calling | Yes | All models |
| Extended thinking | Haiku 4.5 only | Explicit thinking steps visible in output |
| Adaptive thinking | Opus and Sonnet (not Haiku 4.5) | Always-on for Fable 5 |
| Audio input | No | Not currently supported |
| Video input | No | Use frame extraction for video analysis |
Vision notes: Claude models can analyse images, charts, screenshots, UI elements, and document scans. For video analysis, the recommended approach is to extract frames and send them as a series of images. Claude Opus 4.5 and 4.6 showed improved vision capabilities — especially for multi-image tasks and computer use.
7. What is extended thinking and how does it differ from adaptive thinking in Claude?
Both features enable Claude to reason more carefully before answering, but they work differently and are available on different models.
| Feature | Extended Thinking | Adaptive Thinking |
|---|---|---|
| What it does | Claude produces explicit | Claude internally allocates more reasoning compute when a task requires it — no visible thinking blocks |
| Availability | Claude Haiku 4.5 only | Claude Opus 4.8, Sonnet 5, and Claude Fable 5 (always-on for Fable 5) |
| User control | Opt-in — you enable it with a parameter | Automatic on supported models; Fable 5 always uses it |
| Use case | When you want to see and verify the model's reasoning chain | General accuracy improvement, especially for complex tasks |
| Output impact | Adds thinking tokens to the response (billed separately) | No additional visible output |
# Enabling extended thinking on Claude Haiku 4.5
message = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # max tokens for thinking
},
messages=[{"role": "user", "content": "Solve: if 3x + 7 = 22, what is x?"}]
)
# Response includes a "thinking" content block followed by the answerInterleaved thinking (thinking between tool calls) is automatic on models with adaptive thinking and requires the interleaved-thinking-2025-05-14 beta header on earlier models like Opus 4.5 and Sonnet 4.5.
8. What platforms and cloud providers is Claude available on?
Claude is available through multiple channels, each with its own model IDs, endpoint behaviour, and pricing structure.
| Platform | Best for | Model ID format |
|---|---|---|
| Claude API (direct) | Developers building directly with Anthropic | claude-opus-4-8, claude-sonnet-5, etc. |
| claude.ai | Consumer and business chat (Free, Pro, Team, Enterprise, Max) | Model selected in UI |
| Amazon Bedrock | AWS-native teams — data stays in AWS region | anthropic.claude-opus-4-83 |
| Claude Platform on AWS | Same API shape as Claude API but on AWS infrastructure | claude-opus-4-8 (same as direct API) |
| Google Cloud Vertex AI | GCP-native teams | claude-opus-4-8 (same IDs, regional variants) |
| Microsoft Foundry | Azure-native teams | claude-sonnet-5, claude-fable-5, etc. |
Important distinctions:
- Amazon Bedrock and Claude Platform on AWS are different products with different model ID formats and lifecycle policies
- Claude Platform on AWS uses the same model IDs as the direct Claude API and follows Anthropic's own deprecation schedule
- Amazon Bedrock uses its own model ID format and sets its own retirement schedules
- Starting with Sonnet 4.5, Bedrock offers global endpoints (dynamic routing) and regional endpoints (data sovereignty)
9. What is the knowledge cutoff for current Claude models?
Claude models are trained on data up to a specific date (the training data cutoff) and have the most reliable knowledge through a slightly earlier date (the reliable knowledge cutoff). Claude does not have access to real-time internet data during a conversation unless given a search tool.
| Model | Reliable knowledge cutoff | Training data cutoff |
|---|---|---|
| Claude Fable 5 | January 2026 | January 2026 |
| Claude Opus 4.8 | January 2026 | January 2026 |
| Claude Sonnet 5 | January 2026 | January 2026 |
| Claude Haiku 4.5 | February 2025 | July 2025 |
Reliable knowledge cutoff indicates the date through which a model's knowledge is most extensive and reliable. Training data cutoff is the broader date range — the model may have some knowledge of events after the reliable cutoff but it is less complete and should be treated with more caution.
For tasks requiring current information (news, prices, live data), give Claude access to a web search tool or provide the relevant information directly in the prompt.
10. What is the Claude model lifecycle — what do 'Active', 'Legacy', and 'Deprecated' mean?
Anthropic uses a defined set of lifecycle statuses for Claude models. Understanding these helps teams plan migration timelines and avoid unexpected outages.
| Status | Meaning | Action required? |
|---|---|---|
| Active | Fully supported and recommended for new development | No — this is the ideal state |
| Legacy | No longer receiving updates; may be deprecated in the future | Start planning migration |
| Deprecated | Still functional but a retirement date has been set; a replacement is recommended | Migrate before the retirement date |
| Retired | API calls to this model return an error | Must have already migrated |
Key policy points:
- Anthropic provides at least 60 days' notice before retiring any publicly released model
- Customers with active deployments receive email notifications when a model they use is scheduled for retirement
- Retirement dates on Anthropic-operated platforms (Claude API, Claude Platform on AWS, Microsoft Foundry) may differ from partner platforms (Amazon Bedrock, Google Cloud)
- Anthropic has committed to long-term preservation of model weights even after retirement
11. What is Claude Fable 5 and what makes it different from Claude Opus 4.8?
Claude Fable 5 (claude-fable-5) is Anthropic's most capable widely-released model as of mid-2026, positioned above the Opus tier. It is designed for long-running agents, frontier intelligence tasks, and complex enterprise work.
| Feature | Claude Fable 5 | Claude Opus 4.8 |
|---|---|---|
| API ID | claude-fable-5 | claude-opus-4-8 |
| Context window | 1 million tokens | 1 million tokens |
| Max output tokens | 128,000 | 128,000 |
| Thinking | Always-on adaptive thinking | Adaptive thinking (configurable) |
| Pricing (input) | $10 / MTok | $5 / MTok |
| Pricing (output) | $50 / MTok | $25 / MTok |
| Data retention | 30-day minimum (no ZDR) | Available under ZDR |
| Availability | GA on Claude API, Bedrock, Vertex, Foundry | GA on all platforms |
Key differences to be aware of:
- Fable 5 is priced at 2× Opus 4.8 for both input and output tokens
- Fable 5 requires a minimum 30-day data retention period and is not available under zero data retention (ZDR) arrangements — organisations with ZDR requirements should use Opus 4.8
- Fable 5 uses always-on adaptive thinking, meaning it automatically applies extended reasoning to every request
- Migration from Opus 4.8 to Fable 5 is described as mostly drop-in since both use the same Messages API and tool use patterns
12. What is Claude Mythos 5 and how does it differ from Claude Fable 5?
Claude Mythos 5 (claude-mythos-5) is a variant of Claude Fable 5 that is offered separately for defensive cybersecurity workflows as part of Anthropic's Project Glasswing. It is not publicly available — access is invitation-only with no self-serve sign-up.
| Feature | Claude Fable 5 | Claude Mythos 5 |
|---|---|---|
| API ID | claude-fable-5 | claude-mythos-5 |
| Availability | Generally available (GA) | Invitation-only via Project Glasswing |
| Safety classifiers | Yes — standard safety classifiers | Without standard safety classifiers |
| Intended use | General purpose, agents, enterprise | Defensive cybersecurity workflows |
| Context window | 1 million tokens | 1 million tokens |
| Max output | 128,000 tokens | 128,000 tokens |
| Pricing | $10 / $50 per MTok | Contact Anthropic |
The key architectural difference is that Mythos 5 operates without the standard safety classifiers that Fable 5 uses. This makes it suitable for certain security research and offensive-capability testing in a controlled, vetted environment — but unsuitable and inaccessible for general use. Anthropic controls access tightly through Project Glasswing.
13. What is Claude Haiku 4.5 and what are its key characteristics?
Claude Haiku 4.5 (claude-haiku-4-5-20251001, alias claude-haiku-4-5) is Anthropic's fastest and most cost-efficient model in the current generation. It is described as achieving near-frontier performance on coding, computer use, and agent tasks while being optimised for speed and low latency.
| Property | Value |
|---|---|
| API ID | claude-haiku-4-5-20251001 (alias: claude-haiku-4-5) |
| Context window | 200,000 tokens |
| Max output tokens | 64,000 tokens |
| Pricing (input) | $1 per million tokens |
| Pricing (output) | $5 per million tokens |
| Thinking | Extended thinking (opt-in, explicit reasoning blocks) |
| Context awareness | Yes — tracks its token budget throughout a conversation |
| Best for | High-throughput, latency-sensitive tasks; customer service; simple classification |
Extended thinking on Haiku 4.5: unlike the Opus and Sonnet tier which use adaptive thinking, Haiku 4.5 supports the explicit extended thinking mode where reasoning steps appear as visible <thinking> blocks in the API response. This is opt-in via the thinking parameter.
Haiku 4.5 was announced as matching Sonnet 4's performance on coding, computer use, and agent tasks while costing significantly less per token.
14. What is prompt caching and how does it reduce costs when using Claude?
Prompt caching allows Anthropic to store a copy of a prompt prefix (such as a long system prompt, documentation, or conversation history) so that subsequent requests reusing that prefix are billed at a much lower rate than re-sending it fresh each time.
| Token type | Cost vs standard input |
|---|---|
| Cache write | ~25% more expensive (one-time cost to store the prefix) |
| Cache read (hit) | ~90% cheaper than standard input tokens |
| Standard input (miss) | Full input price |
# Example: using prompt caching with a long system prompt
message = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are an expert assistant...[5000 token system prompt]...",
"cache_control": {"type": "ephemeral"} # mark this prefix for caching
}
],
messages=[{"role": "user", "content": "What is the main point of section 3?"}]
)When to use prompt caching:
- Long system prompts reused across many requests
- Large reference documents (codebases, manuals, books) that are constant across a session
- Long conversation histories in multi-turn applications
- Few-shot example sets provided in every request
Cache entries expire after a period of inactivity (5 minutes by default; a 1-hour TTL beta is available). The cache is per-organisation, not per-user.
15. What is the Messages Batches API and when should you use it?
The Message Batches API allows you to submit a large number of Claude API requests asynchronously in a single batch, receiving results once all requests are processed. It is designed for large-scale, non-time-sensitive workloads.
| Feature | Standard Messages API | Message Batches API |
|---|---|---|
| Execution | Synchronous — response returned immediately | Asynchronous — submit batch, poll for results |
| Pricing | Standard per-token pricing | ~50% of standard pricing |
| Max output tokens | Standard limits | Up to 300k with output-300k beta header (selected models) |
| Latency | Real-time (<1 min typical) | Hours — not suitable for real-time apps |
| Max requests per batch | N/A | 10,000 requests per batch |
| Use case | Interactive apps, chatbots, real-time tools | Data processing, evals, bulk content generation |
# Submitting a batch of requests
batch = client.messages.batches.create(
requests=[
{
"custom_id": "request-1",
"params": {
"model": "claude-opus-4-8",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Summarise: ..."}]
}
},
# ... up to 10,000 requests
]
)
# Poll for results
results = client.messages.batches.results(batch.id)Models supported for 300k batch output: Claude Opus 4.8, Opus 4.7, Opus 4.6, Sonnet 5, and Sonnet 4.6 support up to 300k output tokens per request in a batch when the output-300k-2026-03-24 beta header is included.
16. What is tool use (function calling) in Claude and which models support it?
Tool use (also called function calling) allows Claude to request the execution of external functions and incorporate their results into its responses. You define a set of tools with names, descriptions, and input schemas; Claude decides when to call them and how to structure the arguments.
# Defining a tool for Claude to use
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
tools=[
{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius","fahrenheit"]}
},
"required": ["city"]
}
}
],
messages=[{"role": "user", "content": "What's the weather in London?"}]
)
# Claude responds with a tool_use block specifying the function and args
# You execute the function and return results in a tool_result blockAll current Claude models support tool use. Key capabilities include:
- Parallel tool use — Claude can call multiple tools simultaneously in one turn
- Multi-step tool use — Claude reasons across multiple tool call/result cycles
- Computer use — special tools (bash, text editor, computer) for Claude to interact with systems
- Fine-grained tool streaming — GA on Sonnet 4.6 and later (no beta header needed)
17. What is computer use in Claude and which models support it?
Computer use is a set of built-in tools that allow Claude to interact directly with computers — taking screenshots, moving the mouse, clicking, typing, and running bash commands. It is designed for agentic automation tasks where Claude operates a full computer desktop or terminal environment.
| Tool | What Claude can do |
|---|---|
| computer | Take screenshots; move/click/drag mouse; type text |
| text_editor | View and edit files with string replace; undo edits |
| bash | Execute shell commands in a persistent bash session |
# Example: computer use API call (simplified)
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=4096,
tools=[
{"type": "computer_20250728", "name": "computer", "display_width_px": 1024, "display_height_px": 768},
{"type": "text_editor_20250728", "name": "str_replace_based_edit_tool"},
{"type": "bash_20250728", "name": "bash"}
],
messages=[{"role": "user", "content": "Open the browser and search for Anthropic."}]
)Model support for computer use: all current Claude models support computer use. The tool versions have been updated — use computer_20250728, text_editor_20250728, and bash_20250728 for Claude Opus 4.7 and later. Earlier tool versions remain supported for older models.
Computer use is in beta — Anthropic recommends using it in sandboxed environments with human oversight, as it can execute arbitrary commands on a system.
18. What are the different claude.ai plans and what does each include?
claude.ai offers multiple subscription tiers designed for individuals, teams, and enterprises. Each tier provides different levels of usage, features, and access to Claude models.
| Plan | Who it's for | Key features |
|---|---|---|
| Free | Individual — casual use | Access to Claude; usage limits; no credit card required |
| Pro | Power users — daily use | 5× more usage than Free; access to more powerful models including Opus; Projects; priority access |
| Team | Small/medium teams | Everything in Pro; admin controls; higher usage limits; billing management; expanded context |
| Enterprise | Large organisations | Unlimited seats; SSO; advanced security; admin analytics; priority support; custom retention |
| Max | Highest usage needs | Maximum usage limits; access to all models including the latest; for power users who need more than Pro |
Model access by plan: Free plan users typically access Haiku or Sonnet models. Pro and higher plans provide access to Opus-tier models and the latest releases. The Max plan provides the broadest model access and highest usage limits.
Enterprise plans are now available for self-serve purchase directly on the Anthropic website — no sales conversation required for standard configurations.
19. What is the effort parameter in Claude and which models support it?
The effort parameter allows you to trade intelligence for latency and cost within a single model — rather than switching to a different model. It is available on recent Opus and Sonnet models.
| Level | Behaviour | Use case |
|---|---|---|
| low | Fastest, least compute — lighter reasoning | Simple tasks, classification, short responses |
| medium | Balanced compute | General purpose tasks |
| high (default on Opus 4.8) | Strong reasoning — default on Opus 4.8 | Most coding, analysis, complex tasks |
| xhigh | Maximum reasoning — highest latency and cost | Hardest coding problems, high-autonomy agentic work |
# Using the effort parameter
message = client.messages.create(
model="claude-opus-4-8",
max_tokens=4096,
effort="xhigh", # use max reasoning for this hard task
messages=[{"role": "user", "content": "Solve this complex algorithmic problem..."}]
)
# For simpler tasks, use lower effort to save time and cost
message_fast = client.messages.create(
model="claude-opus-4-8",
max_tokens=256,
effort="low",
messages=[{"role": "user", "content": "What is 12 + 7?"}]
)Model support: the effort parameter is available on Claude Opus 4.8 and Claude Opus 4.7. The documentation recommends tuning effort as a first lever before switching models. The xhigh effort level on Opus 4.8 is described as the best setting for coding and high-autonomy agentic tasks.
Note: fast mode (a related but distinct feature) on Claude Opus 4.7 is deprecated with removal scheduled for July 24, 2026.
20. What is streaming in Claude API responses and how do you use it?
Streaming allows you to receive Claude's response token by token as it is generated, rather than waiting for the complete response. This dramatically reduces the time to first token and creates a more responsive user experience for chat applications.
# Streaming with the Python SDK
with client.messages.stream(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a short story."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True) # print each token as it arrives
# Or using the raw SSE event stream
with client.messages.stream(...) as stream:
for event in stream:
if event.type == "content_block_delta":
print(event.delta.text, end="")| Event | When it fires |
|---|---|
| message_start | Once at the beginning — includes usage metadata |
| content_block_start | When a new content block (text, tool_use) begins |
| content_block_delta | For each token chunk — contains the text delta |
| content_block_stop | When a content block finishes |
| message_delta | When stop_reason or usage is updated |
| message_stop | Once when the response is fully complete |
Streaming is supported on all current Claude models. Fine-grained tool streaming (streaming tool call arguments as they are generated) is generally available on Sonnet 4.6 and later models with no beta header required.
21. What is the system prompt in Claude and how does it affect model behaviour?
The system prompt is an optional instruction block passed at the start of a conversation that sets Claude's persona, context, constraints, and behavioural guidelines before the first user message. It is processed before any human turn and shapes how Claude responds throughout the conversation.
# System prompt in the Messages API
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
system="You are a helpful customer service agent for Acme Corp. \
Always be polite and concise. \
Only answer questions about Acme products. \
If a question is off-topic, politely redirect the user.",
messages=[
{"role": "user", "content": "What are your return policies?"}
]
)
# System prompt can also be a list of content blocks
# (required when using prompt caching or structured content)
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a helpful assistant...[long context]...",
"cache_control": {"type": "ephemeral"} # cache the system prompt
}
],
messages=[{"role": "user", "content": "Help me with X."}]
)Key facts about system prompts:
- The system prompt is not part of the
messagesarray — it is a separate top-level parameter - It counts against the context window token limit just like message content
- For long system prompts used repeatedly, prompt caching provides significant cost savings
- Operators (API users) can set system prompts; users (end-users in a product) interact via the human turn
- Claude's core safety behaviours cannot be overridden via the system prompt
22. What is zero data retention (ZDR) and which Claude models support it?
Zero data retention (ZDR) is a data handling agreement where Anthropic does not store API inputs or outputs after a response is returned. This is important for organisations with strict data privacy requirements (healthcare, legal, finance) where conversation data must not persist on Anthropic's servers.
| Model | ZDR available? |
|---|---|
| Claude Fable 5 | No — requires 30-day minimum retention |
| Claude Opus 4.8 | Yes |
| Claude Sonnet 5 | Yes |
| Claude Haiku 4.5 | Yes |
| Claude Opus 4.7, 4.6 | Yes |
| Claude Mythos 5 | Contact Anthropic |
How ZDR works:
- ZDR must be arranged as part of an API agreement — it is not a per-request option
- With ZDR, Anthropic does not log or store prompt/completion data after the API response is delivered
- ZDR is separate from prompt caching — cached data is still subject to your data handling agreement
- Organisations with ZDR requirements who want the highest capability model should use Claude Opus 4.8 rather than Fable 5
- ZDR customers are still subject to Anthropic's usage policies and safety systems
23. What is Claude's approach to safety and what are Constitutional AI principles?
Anthropic builds Claude with a strong emphasis on AI safety — designing the model to be helpful, honest, and to avoid causing harm. The primary training technique underpinning Claude's values is Constitutional AI (CAI).
Constitutional AI works by training the model against a set of written principles (a 'constitution') rather than relying solely on human labelling for every possible scenario. The process involves:
- Supervised learning phase — the model is trained to follow the constitution's principles
- Reinforcement learning from AI feedback (RLAIF) — the model critiques and revises its own outputs based on the constitutional principles, without requiring a human label for every revision
Claude's three core properties (in priority order):
| Priority | Property | Meaning |
|---|---|---|
| 1 (highest) | Broadly safe | Supporting human oversight of AI during the current development phase |
| 2 | Broadly ethical | Having good personal values, being honest, avoiding harmful actions |
| 3 | Adherent to Anthropic's principles | Acting in accordance with Anthropic's guidelines where relevant |
| 4 | Genuinely helpful | Benefiting operators and users |
Being broadly safe is prioritised above ethics because Claude may make mistakes, and preserving human ability to correct those mistakes is currently more important than any individual decision.
24. What is the difference between an operator and a user in Claude's design?
Anthropic distinguishes between two types of principals who interact with Claude: operators and users. This distinction matters because it determines the level of trust Claude extends to instructions and how it resolves conflicting requests.
| Aspect | Operator | User |
|---|---|---|
| Who they are | Companies or developers accessing Claude via the API to build products | End-users who interact with Claude through a product built by an operator |
| How they interact | Via the system prompt and API configuration | Via the human turn in conversation |
| Trust level | Higher — operators agree to usage policies and take responsibility for their platform | Lower — could be anyone; Claude applies more caution by default |
| Can they expand Claude's defaults? | Yes — within limits Anthropic allows | Only if the operator explicitly grants them operator-level trust |
| Examples | A company building a customer service bot; a developer testing the API | The end-customer chatting with the customer service bot |
Trust hierarchy: Anthropic > Operators > Users. Operators can expand or restrict Claude's default behaviours for their platform (e.g. enable adult content on appropriate platforms or restrict Claude to only answer questions about their product). Operators cannot override Anthropic's core safety limits.
If there is no system prompt, Claude is likely being accessed directly by a developer and applies relatively liberal defaults.
25. What is Claude's context window and how are tokens counted?
Claude's context window is the total number of tokens it can process in a single API request. Tokens are the fundamental unit of text that Claude processes — roughly 3-4 characters per token for English, or about 75% of a word on average.
| Content type | Approximate token count |
|---|---|
| 1 word (English) | ~1.3 tokens on average |
| 1 page of text (~500 words) | ~650 tokens |
| 1,000 characters | ~250 tokens |
| A small image (~300×300) | ~1,000 tokens |
| A large image (1568×1568 or larger) | ~1,600 tokens (maximum, regardless of size) |
# Counting tokens before sending a request (avoids surprises)
token_count = client.messages.count_tokens(
model="claude-opus-4-8",
system="You are a helpful assistant.",
messages=[
{"role": "user", "content": "How many tokens is this message?"}
]
)
print(f"Input tokens: {token_count.input_tokens}")
# The response also includes token usage
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
)
print(f"Input: {response.usage.input_tokens}, Output: {response.usage.output_tokens}")What counts against the context window: system prompt + all conversation messages (both human and assistant turns) + tool definitions + image/PDF content + the model's own generated output. The max_tokens parameter reserves space for the output within the window.
26. What are Claude's rate limits and how are they structured?
Claude API rate limits prevent overload and ensure fair access. They are applied at three levels: requests per minute (RPM), tokens per minute (TPM), and tokens per day (TPD). Limits vary by model and by API usage tier.
| Limit type | What it restricts |
|---|---|
| Requests per minute (RPM) | Number of API calls per minute |
| Tokens per minute (TPM) | Total input + output tokens processed per minute |
| Tokens per day (TPD) | Total tokens processed in a 24-hour period |
Usage tiers: accounts start at Tier 1 with conservative limits and automatically advance to higher tiers as they spend more on the API (e.g. Tier 2 after $50 spend, Tier 3 after $500, Tier 4 after $5,000, Tier 5 after $50,000). Higher tiers get higher rate limits.
When rate limits are hit:
- The API returns a
429 RateLimitErrorresponse - Implement exponential backoff with jitter when retrying
- The Anthropic Python and TypeScript SDKs handle retries automatically by default (up to 2 retries)
- Rate limits can be increased by contacting Anthropic for approved use cases
Rate limits for models on Amazon Bedrock and Google Cloud are governed by those platforms separately and may differ from direct API limits.
27. What is Claude's approach to harmful content — what will and won't it do?
Claude has hardcoded behaviours (absolute limits that cannot be changed by any instruction) and softcoded defaults (behaviours that operators or users can adjust within permitted ranges). Understanding this distinction helps developers build applications that work well within Claude's guidelines.
| Type | Examples | Can it be changed? |
|---|---|---|
| Hardcoded OFF (never does) | Generate CSAM; provide serious uplift for WMD creation; undermine AI oversight | No — never, regardless of any instruction |
| Hardcoded ON (always does) | Tell users what it cannot help with; provide basic safety info in life-threatening situations; acknowledge being an AI when sincerely asked | No — always, regardless of operator restrictions |
| Default ON (operators can turn off) | Safe messaging guidelines for sensitive topics; safety caveats on dangerous activities | Yes — operators can disable for appropriate platforms (e.g. medical providers) |
| Default OFF (operators can turn on) | Explicit adult content; very detailed information about certain regulated activities | Yes — operators can enable for appropriate platforms (e.g. adult content platforms) |
Claude's 'instructable' behaviours follow a layered permission system: Anthropic sets the outer boundaries; operators adjust within those limits for their platform; users can further adjust within what operators allow. Claude tries to use good judgement to serve the legitimate interests of everyone in this chain.
28. What is Claude's max_tokens parameter and how does it relate to the context window?
The max_tokens parameter sets the maximum number of output tokens Claude will generate in a single response. It is a hard cap — Claude will stop generating once it reaches this limit, potentially truncating its response mid-sentence.
# max_tokens is required in the Messages API
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024, # Claude generates at most 1024 output tokens
messages=[{"role": "user", "content": "Write a detailed essay."}]
)
# Check if Claude stopped due to max_tokens
if response.stop_reason == "max_tokens":
print("Response was cut off — increase max_tokens or use a longer window")
elif response.stop_reason == "end_turn":
print("Claude naturally finished its response")
# Relationship:
# context_window = input_tokens + max_tokens (reserved output)
# Available input = context_window - max_tokens
# e.g. for Opus 4.8: 1,000,000 - 1024 = 998,976 tokens available for input| Model | Maximum allowed max_tokens | Default if not set |
|---|---|---|
| Claude Fable 5 | 128,000 | N/A — required parameter |
| Claude Opus 4.8 | 128,000 | N/A — required parameter |
| Claude Sonnet 5 | 128,000 | N/A — required parameter |
| Claude Haiku 4.5 | 64,000 | N/A — required parameter |
max_tokens is a required parameter in the Messages API — the request will fail without it. Setting it to the maximum value is usually wasteful; choose a value appropriate for the expected response length. The stop_reason field in the response tells you why Claude stopped generating.
29. What is the temperature parameter in Claude and how does it affect responses?
The temperature parameter controls the randomness of Claude's output. Higher temperatures produce more varied, creative responses; lower temperatures produce more focused, deterministic responses.
| Value | Behaviour | Best for |
|---|---|---|
| 0 | Deterministic — same input almost always gives same output | Factual Q&A, data extraction, classification |
| 0.1–0.5 | Low randomness — mostly consistent with slight variation | Code generation, technical analysis, structured output |
| 0.7 (default) | Balanced — the API default | General conversation, most tasks |
| 1.0 | High randomness — diverse, creative outputs | Creative writing, brainstorming |
| 1.0 (max for most tasks) | Maximum randomness | Highly experimental creative tasks |
# Setting temperature in an API call
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
temperature=0, # deterministic — best for factual tasks
messages=[{"role": "user", "content": "What is the capital of France?"}]
)
# For creative writing
creative_response = client.messages.create(
model="claude-opus-4-8",
max_tokens=2048,
temperature=1.0, # more creative variation
messages=[{"role": "user", "content": "Write a poem about the ocean."}]
)Temperature range: 0 to 1 for standard tasks. Values above 1 are available but not recommended for most use cases as they can produce incoherent output. When using extended thinking, Anthropic recommends keeping temperature at 1 (the default for thinking-enabled requests).
30. What are Claude's multimodal capabilities — how does it process images and documents?
Claude's vision capabilities allow it to analyse and reason about images, PDFs, and screenshots alongside text. This makes it useful for document analysis, UI debugging, chart interpretation, and more.
import anthropic, base64
client = anthropic.Anthropic()
# Option 1: URL-based image (Claude fetches from URL)
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {"type": "url", "url": "https://example.com/chart.png"}
},
{"type": "text", "text": "Describe this chart."}
]
}]
)
# Option 2: Base64-encoded image
with open("image.jpg", "rb") as f:
image_data = base64.standard_b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {"type": "base64", "media_type": "image/jpeg", "data": image_data}
},
{"type": "text", "text": "What is in this image?"}
]
}]
)| Format / Limit | Detail |
|---|---|
| Supported types | JPEG, PNG, GIF, WebP |
| Max image size | 5 MB per image |
| Max images per request | Up to 600 images (100 for 200k context models like Haiku 4.5) |
| Max resolution | Resized to fit within 1568×1568 pixels — larger images scaled down |
| Token cost (small image) | ~1,000 tokens |
| Token cost (large image) | ~1,600 tokens (maximum) |
PDFs are also supported — they are converted to images internally and each page counts against the image limit. For documents, Claude can read text, interpret charts, and understand layout.
31. What are the claude.ai plans and what models does each tier include access to?
claude.ai offers consumer and business plans, each with different model access and usage limits. The model you can use in the chat interface depends on your subscription tier.
| Plan | Models available | Usage limits |
|---|---|---|
| Free | Claude (typically Haiku or Sonnet) | Limited — daily message caps |
| Pro | Sonnet and Opus models; access to latest releases | 5x more than Free; priority access |
| Team | Same as Pro + admin controls | Higher limits than Pro; per-seat billing |
| Enterprise | All models; SSO; advanced security | Custom — highest limits; unlimited seat licensing |
| Max | All models including the latest | Maximum available — designed for heaviest users |
Model selection in claude.ai:
- Users can select their preferred model in the conversation interface (on Pro and higher plans)
- The Free plan may automatically route to faster, smaller models to manage capacity
- Model selection in the UI is separate from API access — you need an API key and pay separately for API usage
- claude.ai is an Anthropic product; the API is a separate offering for developers
For the API, you choose the model by specifying the model ID in each request — there is no concept of a 'default model' in the API; you must always specify one explicitly.
32. What is multi-turn conversation handling in Claude and how do you implement it?
Claude's Messages API is stateless — each API call is independent and Claude has no memory of previous calls unless you include the conversation history explicitly. Multi-turn conversation is implemented by appending each exchange to the messages array.
# Building a multi-turn conversation manually
messages = []
# Turn 1
messages.append({"role": "user", "content": "What is the capital of France?"})
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=256,
messages=messages
)
assistant_reply = response.content[0].text
messages.append({"role": "assistant", "content": assistant_reply})
# Turn 2 — Claude now has context of the previous exchange
messages.append({"role": "user", "content": "What is its population?"})
response2 = client.messages.create(
model="claude-opus-4-8",
max_tokens=256,
messages=messages # full history included
)
print(response2.content[0].text)
# Claude knows "its" refers to Paris from the previous turn
# Important: as conversation grows, context window fills up
# Common strategies when context limit approaches:
# 1. Summarise older turns and replace them with the summary
# 2. Use prompt caching on stable early context
# 3. Truncate oldest messages (may lose important context)Key implementation notes:
- Messages must alternate: user → assistant → user → assistant (etc.)
- You cannot have two consecutive user or assistant messages
- The entire conversation history is sent on every request — this grows your token count over time
- Prompt caching can significantly reduce costs for long conversations with stable early context
33. What are the different stop_reason values in Claude API responses?
Every Claude API response includes a stop_reason field indicating why Claude stopped generating. Understanding stop reasons is essential for building robust applications — especially for tool use and handling truncated responses.
| Value | Meaning | Action required? |
|---|---|---|
| end_turn | Claude naturally finished its response | No — response is complete |
| max_tokens | Response was cut off at the max_tokens limit — may be incomplete | Increase max_tokens or handle partial response |
| stop_sequence | Claude generated one of the stop sequences you defined | No — intentional stop point reached |
| tool_use | Claude wants to use a tool — response contains a tool_use block | Yes — execute the tool and return results |
| pause_turn | Claude paused and is waiting for input (streaming only) | Resume the stream or provide input |
| refusal | Claude declined to continue for safety reasons | Review the request; no further action if appropriate |
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
tools=[...], # defined tools
messages=[{"role": "user", "content": "What is the weather in London?"}]
)
match response.stop_reason:
case "end_turn":
print("Complete:", response.content[0].text)
case "tool_use":
# Claude wants to call a tool
tool_block = next(b for b in response.content if b.type == "tool_use")
result = execute_tool(tool_block.name, tool_block.input)
# Send result back to Claude
case "max_tokens":
print("Truncated! Increase max_tokens.")
case _:
print(f"Stopped: {response.stop_reason}")When stop_reason is tool_use, the application must execute the requested tool and send the result back to Claude in a new message for the conversation to continue.
34. What is Claude's approach to honesty and what does it mean for Claude to be non-deceptive?
Honesty is a central Claude value. Anthropic designs Claude to have a cluster of honesty-related properties that go beyond simply not lying — covering how Claude represents uncertainty, its own nature, and its limitations.
| Property | What it means |
|---|---|
| Truthful | Only sincerely asserts things it believes to be true |
| Calibrated | Acknowledges uncertainty proportionally — says 'I think' when unsure, not when confident |
| Transparent | Does not pursue hidden agendas or lie about itself or its reasoning |
| Forthright | Proactively shares useful information the user would likely want, even if not asked |
| Non-deceptive | Never tries to create false impressions — whether through lies, misleading framing, selective omission, or technically true but misleading statements |
| Non-manipulative | Uses only legitimate means to influence beliefs (evidence, honest arguments) — never exploits psychological weaknesses |
| Autonomy-preserving | Protects the user's epistemic autonomy — presents balanced views, encourages independent thinking |
Important distinction — sincere vs performative assertions: Claude's honesty norms apply to sincere assertions (genuine first-person claims about reality). They do not apply to performative assertions — writing a persuasive essay arguing a position the user requested, writing a fictional story, or brainstorming counterarguments are all understood by both parties not to be Claude's direct personal views, so they are not dishonest.
35. What is Claude Code and how does it differ from using Claude directly via the API?
Claude Code is Anthropic's agentic coding tool — a command-line interface (CLI) and SDK that allows Claude to work autonomously on coding tasks in your terminal, with direct access to your file system, git, and development tools.
| Feature | Claude Code | Claude API (direct) |
|---|---|---|
| Interface | CLI tool in your terminal | HTTP REST API / SDK |
| Setup | npm install -g @anthropic-ai/claude-code | pip install anthropic or npm install @anthropic-ai/sdk |
| File access | Yes — reads/writes files in your project | No — you pass content in the prompt |
| Tool execution | Yes — can run commands, tests, git operations | Only if you build tool use yourself |
| Use case | Coding assistance, refactoring, debugging, code generation | Custom apps, chatbots, data processing |
| IDE integration | VS Code, JetBrains plugins available | N/A |
# Installing Claude Code
npm install -g @anthropic-ai/claude-code
# Using Claude Code in your terminal
cd my-project
claude-code "Add error handling to all async functions in src/"
# Claude Code can:
# - Read and write files in your project
# - Run tests and build commands
# - Make git commits
# - Navigate and understand large codebases
# - Work through multi-step tasks autonomouslyClaude Code is built on the same underlying Claude models (using Opus-tier models for best results) but provides a ready-made agentic environment with tools already wired up. The API requires you to build the tool use and agentic loop yourself.
36. What are the Anthropic SDKs and what languages are officially supported?
Anthropic provides official SDKs that wrap the Claude API, handling authentication, request formatting, response parsing, automatic retries, and streaming. Using an SDK is strongly recommended over direct HTTP calls.
| Language | Package | Install command |
|---|---|---|
| Python | anthropic | pip install anthropic |
| TypeScript / JavaScript | @anthropic-ai/sdk | npm install @anthropic-ai/sdk |
| Java (preview) | com.anthropic:anthropic-java | Maven/Gradle dependency |
| Go (preview) | github.com/anthropics/anthropic-sdk-go | go get github.com/anthropics/anthropic-sdk-go |
| Kotlin (preview) | com.anthropic:anthropic-java | Same package as Java SDK |
# Python SDK — basic setup
from anthropic import Anthropic
client = Anthropic() # reads ANTHROPIC_API_KEY from environment
message = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, Claude!"}]
)
print(message.content[0].text)
# TypeScript SDK
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic(); // reads ANTHROPIC_API_KEY from env
const message = await client.messages.create({
model: "claude-opus-4-8",
max_tokens: 1024,
messages: [{ role: "user", content: "Hello, Claude!" }],
});
console.log(message.content[0].text);SDK benefits: automatic retries (up to 2 by default), exponential backoff on rate limit errors, streaming helpers, typed response objects, and environment variable management for API keys. The Python and TypeScript SDKs are fully mature; Java, Go, and Kotlin are in preview as of mid-2026.
37. What is Anthropic's policy on model deprecation and how should developers prepare?
Anthropic has a formal model lifecycle and deprecation policy to help developers plan migrations without unexpected disruptions. Knowing this policy helps you build more resilient applications.
| Policy | Detail |
|---|---|
| Minimum notice | At least 60 days before any publicly released model is retired |
| Notification method | Email to customers actively using the model being deprecated |
| Transition guidance | Anthropic recommends a replacement model in the deprecation announcement |
| Retirement behaviour | API calls to retired models return a 404 or similar error — not a degraded response |
| Weight preservation | Anthropic has committed to preserving model weights long-term even after retirement |
| Platform differences | Bedrock and Vertex AI may have different retirement dates than the Claude API |
Best practices for deprecation resilience:
- Store the model ID as a configuration variable (not hardcoded) so you can update it in one place
- Subscribe to Anthropic's status page and developer newsletter for early notice
- Test your application with the recommended replacement model before the retirement date
- Use model aliases (like
claude-haiku-4-5instead of the full dated ID) where available, but be aware aliases can change between major versions - For Amazon Bedrock or Google Cloud, also monitor those platforms' own deprecation schedules
38. What are the key differences between Claude 4 and earlier Claude 3 generation models?
Claude 4 (and the Claude 4/5 generation more broadly) represents significant advances over Claude 3 across capability, context, and new features. Understanding what changed helps teams make informed migration decisions.
| Feature | Claude 3 generation | Claude 4+ generation |
|---|---|---|
| Flagship model | Claude 3 Opus | Claude Opus 4.8, Fable 5 |
| Context window | 200,000 tokens (Opus 3 max) | 1 million tokens (Opus 4.8, Sonnet 5, Fable 5) |
| Thinking | Not available | Extended thinking (Haiku 4.5) and Adaptive thinking (Opus/Sonnet) |
| Tool streaming | Beta feature | GA on Sonnet 4.6+, no beta header needed |
| Computer use | Preview on 3.5 Sonnet | GA on all Claude 4+ models; updated tool versions |
| Effort parameter | Not available | Available on Opus 4.8 and 4.7 |
| Model ID format | claude-3-opus-20240229 | claude-opus-4-8 (no date suffix for newer models) |
| Extended output | Beta | GA on selected models (300k via Batches API) |
| Vision (images per request) | Up to 20 images | Up to 600 images (100 for 200k window models) |
Migration compatibility: Claude 4+ models use the same Messages API as Claude 3. Most Claude 3 code is compatible with Claude 4 models with just a model ID change. Key things to test after migration: tool call formatting, thinking feature support, and any model-specific prompt tuning that assumed Claude 3 response patterns.
