API Response Time Comparison

Compare API latency and response times across different LLM providers and models.

Note: These are approximate values based on typical performance. Actual latency varies by region, load, and network conditions. Values represent streaming mode performance.
How many tokens you expect in the response
RankModelProviderFirst TokenPer TokenTotal TimeTokens/Second
#1
Gemini 1.5 Flash
Google180ms18ms
9,180ms
9.18s
54.5
tok/s
#2
GPT-3.5 Turbo
OpenAI200ms20ms
10,200ms
10.20s
49.0
tok/s
#3
Claude 3 Haiku
Anthropic220ms22ms
11,220ms
11.22s
44.6
tok/s
#4
GPT-4o mini
OpenAI250ms25ms
12,750ms
12.75s
39.2
tok/s
#5
GPT-4o
OpenAI400ms45ms
22,900ms
22.90s
21.8
tok/s
#6
Claude 3.5 Sonnet
Anthropic450ms50ms
25,450ms
25.45s
19.6
tok/s
#7
Gemini 1.5 Pro
Google480ms55ms
27,980ms
27.98s
17.9
tok/s
#8
GPT-4 Turbo
OpenAI500ms60ms
30,500ms
30.50s
16.4
tok/s
Fastest Model
Gemini 1.5 Flash
9,180ms total
Avg Response Time
18773ms
Across all models
Output Tokens
500
Expected generation

Latency Optimization Tips

  • Use streaming mode to show partial results faster and improve perceived performance
  • Choose models with lower latency for real-time/interactive applications
  • Consider using faster models (like GPT-4o mini) for non-critical tasks
  • Implement caching for common queries to bypass API calls entirely
  • Deploy in the same region as your API provider for lower network latency