What is an API?

An API is a contract that lets two pieces of software talk to each other without knowing each other's internals — you send a structured request, you get a structured response back.

An API (Application Programming Interface) is a contract that lets two pieces of software talk to each other. You don't need to know how the other system works internally. You just need to know what to send and what you'll get back.

When you call Claude, you're using an API. You send a request (your prompt, model name, settings). The Anthropic servers process it and send a response (the generated text, token counts). You never see how Claude actually works. The API is the boundary.


The Basic Pattern

Every API interaction follows the same shape:

You                       The API server
 │                              │
 │──── request ────────────────►│
 │     (what you want)          │  (does the work)
 │                              │
 │◄─── response ───────────────│
 │     (what you got back)      │

In web APIs (the kind you'll use for AI), requests and responses travel over HTTP. The same protocol your browser uses to load web pages.


A Real Example

Calling the Anthropic API to generate text:

import anthropic

client = anthropic.Anthropic(api_key="sk-ant-...")

# Request: tell the server what you want
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=100,
    messages=[{"role": "user", "content": "What is 2 + 2?"}]
)

# Response: what came back
print(response.content[0].text)  # "2 + 2 equals 4."

You sent a request with a model name and a message. The server processed it and sent back the generated text. You didn't need to know anything about how Claude works internally.


HTTP Methods

Web APIs use HTTP verbs to signal intent:

MethodMeaningExample
GETRead somethingFetch a list of your past conversations
POSTCreate or trigger somethingSend a message to Claude
PUTReplace somethingUpdate your settings
DELETERemove somethingDelete a conversation

Most AI API calls use POST. You're submitting data to trigger work.


What Goes in a Request

URL (endpoint): Where to send the request.
https://api.anthropic.com/v1/messages

Headers: Metadata about the request — who you are, what format you're sending.

Authorization: Bearer sk-ant-...
Content-Type: application/json

Body: The actual data, usually as JSON.

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 1024,
  "messages": [{"role": "user", "content": "Hello"}]
}

What Comes Back

Status code: A number indicating success or failure.

CodeMeaning
200Success
400Bad request (you sent something wrong)
401Unauthorized (wrong or missing API key)
429Rate limited (too many requests)
500Server error (their problem, not yours)

Body: The response data, again usually JSON.

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "content": [{"type": "text", "text": "Hello! How can I help?"}],
  "usage": {"input_tokens": 10, "output_tokens": 8}
}

API Keys

APIs need to know who you are, mostly so they can bill you and enforce rate limits. You prove identity with an API key: a long secret string that acts like a password for your account.

# Never hard-code the key in your code
import os
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

Keep API keys in environment variables, never in source code. If you commit a key to GitHub it will be found and abused within minutes.


Rate Limits

APIs limit how fast you can call them. To protect their servers and ensure fair access. You'll see limits like:

  • Requests per minute (RPM): how many calls you can make per minute
  • Tokens per minute (TPM): total tokens (input + output) per minute

When you exceed a limit, you get a 429 error. The fix: wait and retry, or use exponential backoff:

import time

for attempt in range(5):
    try:
        response = client.messages.create(...)
        break
    except anthropic.RateLimitError:
        time.sleep(2 ** attempt)  # wait 1s, 2s, 4s, 8s, 16s

REST vs Other Styles

Most AI APIs are REST APIs. They use HTTP verbs and URLs to represent actions on resources. REST is the dominant style because it's simple and works everywhere.

You may also encounter:

  • GraphQL: one endpoint, you specify exactly what data you want in the query (GitHub's API uses this)
  • gRPC: binary protocol, much faster, used for service-to-service communication (see java/grpc)
  • WebSockets: persistent connection for real-time streaming (GPT-4o Realtime API uses this)

For LLM APIs, REST + Server-Sent Events (SSE) for streaming is the standard pattern.


Key Facts

  • API = contract between two pieces of software: defined request format, defined response format
  • Web APIs travel over HTTP; most use JSON for request/response bodies
  • Status codes: 2xx = success, 4xx = your error, 5xx = their error
  • API keys authenticate you — treat them like passwords, store in environment variables
  • Rate limits are per account; 429 means slow down; use exponential backoff
  • REST is the dominant API style; most AI APIs are REST + SSE for streaming

Common Failure Cases

API key committed to source control and abused within minutes
Why: developers test locally by hardcoding keys and accidentally commit them; automated scanners find them almost instantly.
Detect: receive an unexpected billing alert or see API usage you didn't make; GitHub sends a secret scanning alert.
Fix: rotate the key immediately in the provider's dashboard; use git filter-repo to scrub the history; move all keys to env vars or a secrets manager.

Requests hang indefinitely when the server doesn't respond
Why: no timeout is set on the HTTP client; a stalled server leaves the connection open and the client waits forever.
Detect: the process hangs on an API call with no error; logs show no progress for >30 seconds.
Fix: always set timeout on HTTP clients (httpx.Client(timeout=30.0)); handle httpx.TimeoutException with a retry or fallback.

Exponential backoff retries hit the max before the rate limit window resets
Why: the rate limit window is 60 seconds; with 3 retries at 1s/2s/4s, total wait is 7 seconds — too short.
Detect: all retries exhaust before the 429 window resets; the final retry still gets a 429.
Fix: read the Retry-After header from 429 responses; back off for at least that duration instead of using a fixed exponential schedule.

JSON body sent without Content-Type: application/json header causes 400 error
Why: some API servers reject requests with JSON body but missing or wrong Content-Type.
Detect: HTTP 400 Bad Request or 415 Unsupported Media Type when the body looks correct.
Fix: always set Content-Type: application/json in headers; most SDK clients do this automatically but raw requests calls sometimes forget.

Streaming SSE response parsed as a single chunk, missing intermediate tokens
Why: requests.get(...).text buffers the entire response; for streaming endpoints the response is never complete until the stream ends.
Detect: no output appears until the full generation is complete; time-to-first-token is equal to total generation time.
Fix: use stream=True with requests and iterate response.iter_lines(); or use the provider SDK's streaming method.

Connections

Open Questions

  • What API behaviour is underdocumented but critical for production use?
  • When should you build an abstraction layer over this API versus calling it directly?