Google AI API (Gemini)

Google's Gemini API covers both Google AI Studio (developer) and Vertex AI (enterprise GCP) entry points, with the largest context window of any commercial model and competitive pricing for high-volume workloads.

Updated Invalid Date·

google gemini vertex-ai google-ai-studio gemini-api multimodal

Google's LLM API surface spans two entry points: Google AI Studio (direct API, developer-friendly) and Vertex AI (enterprise, GCP-integrated). Both serve the Gemini model family. 650M monthly active Gemini users as of April 2026.

Models (April 2026)

Model	Context	Strength	Pricing (in/out per M)
gemini-2.5-pro	1M	Best reasoning, long context	$1.25 / $10
gemini-2.5-flash	1M	Fast, cheap, thinking mode	$0.15 / $0.60
gemini-2.0-flash	1M	Previous gen, stable	$0.10 / $0.40
gemini-1.5-pro	2M	Largest context window	$1.25 / $5
text-embedding-004	2K	Embeddings	$0.025 / —

Gemini 2.5 Pro has the highest context window of any commercial model at 2M tokens (Gemini 1.5 Pro). Strong on coding and reasoning; competitive with Claude Opus and o3.

Setup

pip install google-generativeai          # Google AI Studio SDK
pip install google-cloud-aiplatform      # Vertex AI SDK

Get an API key from Google AI Studio (aistudio.google.com).

Google AI Studio SDK

import google.generativeai as genai

genai.configure(api_key="GOOGLE_API_KEY")  # or GOOGLE_API_KEY env var

model = genai.GenerativeModel(
    model_name="gemini-2.5-pro",
    system_instruction="You are a helpful assistant.",
)

# Simple generation
response = model.generate_content("Explain attention mechanisms.")
print(response.text)

# With generation config
response = model.generate_content(
    "Write a haiku about transformers.",
    generation_config=genai.GenerationConfig(
        temperature=0.9,
        max_output_tokens=100,
    ),
)

Streaming

for chunk in model.generate_content("Tell me a story.", stream=True):
    print(chunk.text, end="", flush=True)

Multi-turn Chat

chat = model.start_chat(history=[])

response = chat.send_message("What is RAG?")
print(response.text)

response = chat.send_message("How does it compare to fine-tuning?")
print(response.text)

# Access history
for message in chat.history:
    print(f"{message.role}: {message.parts[0].text[:100]}")

Vision and Multimodal

import PIL.Image

model = genai.GenerativeModel("gemini-2.5-pro")

# Image from file
image = PIL.Image.open("diagram.png")
response = model.generate_content(["Explain this architecture diagram:", image])

# Image from URL
response = model.generate_content([
    "What's in this image?",
    {"mime_type": "image/jpeg", "data": base64_image_bytes},
])

# PDF analysis (Gemini handles PDFs natively)
with open("contract.pdf", "rb") as f:
    pdf_data = f.read()

response = model.generate_content([
    {"mime_type": "application/pdf", "data": pdf_data},
    "Summarise the key terms and obligations in this contract.",
])

Function Calling

def get_stock_price(ticker: str) -> dict:
    """Get current stock price."""
    return {"ticker": ticker, "price": 185.42, "currency": "USD"}

# Define tool
get_stock_tool = genai.protos.Tool(
    function_declarations=[
        genai.protos.FunctionDeclaration(
            name="get_stock_price",
            description="Get the current stock price for a given ticker symbol.",
            parameters=genai.protos.Schema(
                type=genai.protos.Type.OBJECT,
                properties={
                    "ticker": genai.protos.Schema(
                        type=genai.protos.Type.STRING,
                        description="Stock ticker symbol, e.g. AAPL, GOOGL",
                    )
                },
                required=["ticker"],
            ),
        )
    ]
)

model = genai.GenerativeModel("gemini-2.5-pro", tools=[get_stock_tool])
response = model.generate_content("What's Apple's current stock price?")

# Handle tool call
if response.candidates[0].content.parts[0].function_call:
    fc = response.candidates[0].content.parts[0].function_call
    result = get_stock_price(**dict(fc.args))
    
    # Send result back
    response = model.generate_content([
        "What's Apple's stock price?",
        response.candidates[0].content,
        genai.protos.Content(
            parts=[genai.protos.Part(
                function_response=genai.protos.FunctionResponse(
                    name=fc.name, response=result
                )
            )],
            role="function",
        ),
    ])

Structured Output (JSON Mode)

import json

model = genai.GenerativeModel(
    "gemini-2.5-flash",
    generation_config={"response_mime_type": "application/json"},
)

response = model.generate_content(
    "List the top 3 Python web frameworks with a brief description each. Return as JSON array."
)
data = json.loads(response.text)

Embeddings

result = genai.embed_content(
    model="models/text-embedding-004",
    content="What is the capital of France?",
    task_type="retrieval_query",  # or "retrieval_document", "semantic_similarity"
)
embedding = result["embedding"]  # list of 768 floats

Task types affect the embedding. Use retrieval_query for queries and retrieval_document for documents being indexed.

Vertex AI (Enterprise)

Vertex AI adds: IAM/VPC security, regional data residency, enterprise SLAs, audit logging, private endpoints.

import vertexai
from vertexai.generative_models import GenerativeModel

vertexai.init(project="my-gcp-project", location="us-central1")

model = GenerativeModel("gemini-2.5-pro")
response = model.generate_content("Explain quantum computing.")
print(response.text)

Vertex AI uses Google Cloud credentials (Application Default Credentials) rather than API keys:

gcloud auth application-default login

LangChain Integration

from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings

llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-pro",
    google_api_key="GOOGLE_API_KEY",
    temperature=0.7,
)

embeddings = GoogleGenerativeAIEmbeddings(
    model="models/text-embedding-004",
    google_api_key="GOOGLE_API_KEY",
)

Thinking Mode (Gemini 2.5)

Gemini 2.5 Pro/Flash support a thinking mode similar to Claude's extended thinking:

model = genai.GenerativeModel("gemini-2.5-pro")
response = model.generate_content(
    "Prove that there are infinitely many prime numbers.",
    generation_config=genai.GenerationConfig(
        thinking_config=genai.ThinkingConfig(thinking_budget=5000)
    ),
)
# response includes thinking steps + final answer

Google AI vs Anthropic vs OpenAI

Feature	Gemini 2.5 Pro	Claude Sonnet 4.6	GPT-4o
Max context	2M tokens	1M tokens	128K tokens
Pricing (input)	$1.25/M	$3/M	$2.50/M
Code (SWE-bench)	~70%+	79.6%	~73%
Google Workspace integration	Native	None	None
Multimodal	Strong	Strong	Strong

Gemini is the natural choice for teams already on GCP or deeply integrated with Google Workspace.

Key Facts

Gemini 2.5 Pro context window: 1M tokens; Gemini 1.5 Pro: 2M tokens (largest commercial)
650M monthly active Gemini users as of April 2026
Gemini 2.5 Pro pricing: $1.25/M input, $10/M output
text-embedding-004 output: 768 floats, $0.025/M tokens
Vertex AI adds IAM/VPC, regional data residency, audit logging, and private endpoints
task_type matters for embeddings: retrieval_query vs retrieval_document produce different vectors
Thinking mode configured via ThinkingConfig(thinking_budget=N) in generation config

Common Failure Cases

generate_content raises BlockedPromptException with no useful message
Why: Gemini's safety filters block the request; the error message often doesn't specify which filter triggered.
Detect: google.generativeai.types.BlockedPromptException; response.prompt_feedback.block_reason contains the actual reason.
Fix: check response.prompt_feedback.safety_ratings to identify the triggering category; adjust content or use SafetySettings to lower thresholds for the relevant category.

Vertex AI ADC credentials fail in GitHub Actions
Why: Application Default Credentials require gcloud auth application-default login which is interactive; GitHub Actions can't run it.
Detect: google.auth.exceptions.DefaultCredentialsError in CI; the workflow works locally but not in Actions.
Fix: use Workload Identity Federation (OIDC) in Actions; or set GOOGLE_APPLICATION_CREDENTIALS to a service account JSON key file stored in GitHub Secrets.

Thinking mode significantly exceeds expected token budget
Why: thinking_budget=5000 sets a maximum, not a minimum; complex reasoning problems may use the full budget, multiplying cost unexpectedly.
Detect: response.usage_metadata.thoughts_token_count is consistently near the budget limit; cost per call is higher than expected.
Fix: lower thinking_budget for simpler queries; use a tiered approach — start with Flash (no thinking), escalate to Pro with thinking only for hard problems.

Function calling loop fails because Gemini passes None for optional parameters
Why: Gemini includes optional parameters with null value in the function call payload; the Python function receiving None for a required-looking param raises an error.
Detect: tool function raises TypeError: expected str, got None; the function call in the trace shows null for an optional argument.
Fix: use Optional[str] = None type hints and handle None explicitly in the function body; or set "nullable": True in the schema.

text-embedding-004 task_type mismatch degrades retrieval quality by 5-10%
Why: using retrieval_document type for both indexing and querying, or omitting task_type, bypasses Google's asymmetric embedding training.
Detect: retrieval quality doesn't match Google's benchmarks; swap types and compare recall on a test set.
Fix: always use task_type="retrieval_query" for user queries and task_type="retrieval_document" for documents being indexed.

Connections

apis/anthropic-api — Anthropic API comparison (caching, tool use, extended thinking)
apis/openai-api — OpenAI API comparison (context window, pricing, function calling)
llms/model-families — Gemini 2.5 Pro/Flash in the broader model landscape
landscape/ai-labs — Google DeepMind's position and research agenda
rag/embeddings — text-embedding-004 vs Cohere vs OpenAI embeddings

Open Questions

How does Gemini 2M context quality degrade on retrieval tasks at full window utilisation?
What is the practical cost difference between Vertex AI and AI Studio for production workloads?
How does Gemini thinking mode quality compare to Claude extended thinking on complex reasoning benchmarks?