API Days Australia - API Management in the AI Era

Nilesh Gule @nileshgule
API Management
in
AI Era

$whoami
{
“name” : “Nilesh Gule”,
“role” : “Senior Cloud Solutions Architect at Avanade”
“website” : “https://www.HandsOnArchitect.com",
“github” : “https://GitHub.com/NileshGule"
“twitter” : “@nileshgule”,
“linkedin” : “https://www.linkedin.com/in/nileshgule”,
“YouTube” : “https://www.YouTube.com/@nilesh-gule”
“likes” : “Technical Evangelism, Cricket”,
}

Mistral Prompt and Response Tokens

Deepseek - Prompt and Response Tokens

API Management - GenAI Gateway
Azure-Samples/AI-Gateway: APIM

Challenges in using GenAI APIs
• Track Token usage across multiple applications
• Ensure single app doesn’t consume whole TPM quota
• Distribute load across multiple endpoints
• Ensure committed capacity in PTUs is exhausted before falling back to PAYG instance

Monitor utilization of models
• Sends Token Merics usage to Applications Insights
• Provides overview of utilization of models across
multiple applications or API consumers
GenAI Gateway Capabilities in Azure API Management

Token Metrics Emitting – Token Metric Policy

Token Metrics Emitting

Enforce limits per consumer
• Manage and enforce limits per API consumer based on the
usage of API Tokens

Token Rate Limiting – Token Limit Policy

Token Rate Limiting

Provisioned Throughput Units (PTU)
• Allows to specify the amount of throughput required in a model deployment.
• Granted to subscription as quota
• Quota is specific to region and defines the maximum number of PTUs that can be assigned to deployments in the
subscription and region
• PTU provides
• Predictable performance
• Allocated processing capacity
• Cost savings
Understanding costs associated with provisioned throughput units (PTU)

HA - Load Balanced Pool and Circuit Breaker
• Helps to spread load across multiple Azure OpenAI endpoints
• Round-robin, weighted or priority based load distribution
strategy

Load Balanced Pool and Circuit Breaker

AI Gateway capabilities of Azure API Management
AI Gateway
Security & safety
• Keyless managed identities
• AI Apps & Agents Authorizations -New
• Content Safety -GA
• Credential Manager
Resiliency
• Weight load balancing
• Priority routing to provisioned capacity models
• Backend pools with circuit breaker
• Session aware load balancing -GA
Scalability
• Token rate limits and token quotas
• Semantic Caching -GA
• Model load balancing
• Multi-regional deployments
Traffic mediation & control
• Azure AI Foundry & Azure OpenAI
• OpenAI compatible models -GA
• Responses API -GA
• WebSocket’s for Realtime APIs
• MCP server pass-trough - Soon
• Expose APIs as built-in MCP server - Preview
Developer velocity
• Wizard policy configuration experience
• Self-service with the Developer Portal
• API Center Copilot Studio connector - Preview
• Policy Toolkit
Observability
• Token counting per consumer
• Prompts and completions logging -GA
• Built-in reporting dashboard -GA
Governance
• Policy engine with custom expressions
• API Center MCP server registry - Preview
• Federated API Management
GA
GA
GA Soon
GA
GA
GA
GA
Preview
Preview
GA
Preview
New

Summary
• Track Token usage across multiple applications
• Emit Token Metrics policy
• Ensure single app doesn’t consume whole TPM quota
• Token Limit Policy
• Distribute load across multiple endpoints
• Backend pool load balancing and circuit breaker

Resources
• Azure OpenAI Gateway topologies
• Azure OpenAI Token Limit Policy
• LLM Token Limit Policy
• Azure OpenAI Emit Token Metric Policy
• LLM Emit Token Metric Policy
• Houssem Dellai Youtube videos
• GenAI Labs
• Designing and implementing GenAI gateway solution

Nilesh Gule
ARCHITECT | MICROSOFT MVP
“Code with Passion and
Strive for Excellence”
nileshgule
@nileshgule Nilesh Gule
NileshGule
www.handsonarchitect.com
https://www.youtube.com/@nilesh-gule

Source Code & slide deck
Nilesh Gule fork - GenAI Labs
https://github.com/NileshGule/AI-Gateway
GenAI Labs
https://aka.ms/apim/genai/labs
https://speakerdeck.com/nileshgule/
https://www.slideshare.net/nileshgule/

API Days Australia - API Management in the AI Era

Contenu connexe

Similaire à API Days Australia - API Management in the AI Era

Plus de Nilesh Gule

Dernier

API Days Australia - API Management in the AI Era