Gateway Performance Tuning
Overview
Gateway performance depends on several factors:
- Hardware resources: CPU core count, memory amount
- Traffic profile: Concurrent request count, request size, SSE/streaming usage
- Policy complexity: Number and weight of applied policies (e.g., CPU-intensive policies like Content Filtering)
- Backend response time: Latency profile of the target service
Tuning approach — priority order:
- Automatic profile — Sufficient for most environments, no intervention needed
- Tier recommendations — Ready-made parameter sets based on resource size (this page)
- Manual tuning — Fine-tuning based on load test results
Tier-Based Recommended Settings
The following tables show recommended Gateway Worker settings for different hardware profiles. These values should be validated with pre-production load tests.
Tier 1 — 2 Core / 2 GB RAM
Suitable for low-traffic environments, development, and PoC scenarios.
| Parameter | Recommended Value |
|---|---|
tuneIoThreads | 2 |
tuneWorkerThreads | 512 |
tuneWorkerMaxThreads | 1024 |
tuneRoutingConnectionPoolMinThreadCount | 512 |
tuneRoutingConnectionPoolMaxThreadCount | 2048 |
tuneRoutingConnectionPoolMaxConnectionTotal | 2048 |
tuneElasticsearchClientIoThreadCount | 8 |
tuneAsyncExecutorCorePoolSize | 512 |
tuneAsyncExecutorMaxPoolSize | 1024 |
tuneAsyncExecutorQueueCapacity | 1000 |
Tier 1 Environment Variable Block Example:
tuneIoThreads=2
tuneWorkerThreads=512
tuneWorkerMaxThreads=1024
tuneRoutingConnectionPoolMinThreadCount=512
tuneRoutingConnectionPoolMaxThreadCount=2048
tuneRoutingConnectionPoolMaxConnectionTotal=2048
tuneElasticsearchClientIoThreadCount=8
Tier 2 — 4 Core / 4 GB RAM
Recommended for medium-scale production environments. A balanced starting point for most deployments.
| Parameter | Recommended Value |
|---|---|
tuneIoThreads | 4 |
tuneWorkerThreads | 1024 |
tuneWorkerMaxThreads | 2048 |
tuneRoutingConnectionPoolMinThreadCount | 1024 |
tuneRoutingConnectionPoolMaxThreadCount | 4096 |
tuneRoutingConnectionPoolMaxConnectionTotal | 4096 |
tuneElasticsearchClientIoThreadCount | 16 |
tuneAsyncExecutorCorePoolSize | 1024 |
tuneAsyncExecutorMaxPoolSize | 2048 |
tuneAsyncExecutorQueueCapacity | 1000 |
Tier 2 Environment Variable Block Example:
tuneIoThreads=4
tuneWorkerThreads=1024
tuneWorkerMaxThreads=2048
tuneRoutingConnectionPoolMinThreadCount=1024
tuneRoutingConnectionPoolMaxThreadCount=4096
tuneRoutingConnectionPoolMaxConnectionTotal=4096
tuneElasticsearchClientIoThreadCount=16
Tier 3 — 8 Core / 8 GB RAM
Recommended for high-traffic production environments.
| Parameter | Recommended Value |
|---|---|
tuneIoThreads | 8 |
tuneWorkerThreads | 2048 |
tuneWorkerMaxThreads | 4096 |
tuneRoutingConnectionPoolMinThreadCount | 1024 |
tuneRoutingConnectionPoolMaxThreadCount | 8192 |
tuneRoutingConnectionPoolMaxConnectionTotal | 8192 |
tuneElasticsearchClientIoThreadCount | 32 |
tuneAsyncExecutorCorePoolSize | 2048 |
tuneAsyncExecutorMaxPoolSize | 4096 |
tuneAsyncExecutorQueueCapacity | 2000 |
Tier 3 Environment Variable Block Example:
tuneIoThreads=8
tuneWorkerThreads=2048
tuneWorkerMaxThreads=4096
tuneRoutingConnectionPoolMinThreadCount=1024
tuneRoutingConnectionPoolMaxThreadCount=8192
tuneRoutingConnectionPoolMaxConnectionTotal=8192
tuneElasticsearchClientIoThreadCount=32
JVM Parameters: Automatic memory profile system is recommended for all tiers (no need to add JAVA_OPTS). For profile details and manual GC configuration, see JVM Garbage Collector Tuning.
Thread Tuning
Worker Threads
Worker threads are the core thread pool that processes incoming HTTP requests.
| Parameter | Default | Description |
|---|---|---|
tuneWorkerThreads | 1024 | Minimum Worker thread count — threads created at server startup |
tuneWorkerMaxThreads | 2048 | Maximum Worker thread count — upper limit the pool can grow to under load |
Recommended Thread Values by CPU
| CPU Cores | tuneWorkerThreads | tuneWorkerMaxThreads |
|---|---|---|
| 1 | 512 | 1024 |
| 2 | 1024 | 2048 |
| 4 | 2048 | 4096 |
| 8 | 4096 | 8192 |
General rule: tuneWorkerThreads ≈ CPU × 512, tuneWorkerMaxThreads ≈ CPU × 1024. These values are based on Undertow's (the HTTP server used by Gateway) thread management model.
IO Threads
IO threads are low-level threads that handle network I/O operations (socket read/write).
| Parameter | Default | Description |
|---|---|---|
tuneIoThreads | CPU core count | IO Thread count |
IO thread count is typically kept equal to the CPU core count. Increasing it provides no benefit in most scenarios; it may cause context switching overhead.
Async Executor Thread Pool
Asynchronous operations such as RestApi Policy, Script Policy, logging, and traffic mirroring use this separate thread pool.
| Parameter | Default | Description |
|---|---|---|
tuneAsyncExecutorCorePoolSize | Same as tuneWorkerThreads | Minimum number of threads kept alive in the pool |
tuneAsyncExecutorMaxPoolSize | Same as tuneWorkerMaxThreads | Maximum number of threads that can be created |
tuneAsyncExecutorQueueCapacity | tuneMaxQueueSize if > 0, otherwise 1000 | Maximum number of tasks that can wait in the queue when threads are full |
Thread Pool Sizing Warning:
The async executor thread pool is independent from the main worker thread pool. Ensure that the total thread count (worker + async executor) does not exceed your system's capacity. Consider CPU core count and available memory when determining thread counts.
Connection Pool Tuning
Routing Connection Pool
Manages HTTP connections to backend services. These values are critical for high concurrent request volumes.
| Parameter | Default | Description |
|---|---|---|
tuneRoutingConnectionPoolMinThreadCount | — | Minimum connection thread count |
tuneRoutingConnectionPoolMaxThreadCount | — | Maximum connection thread count |
tuneRoutingConnectionPoolMaxConnectionPerHost | 1024 | Maximum connections to a single backend host |
tuneRoutingConnectionPoolMaxConnectionTotal | 2048 | Total maximum connections to all backend hosts |
When to increase:
- When 503 errors or connection timeout logs are observed
- When backend services respond slowly and connections are exhausted
- When concurrent request count exceeds current pool limits
Cache Connection Pool
Manages connections to cache services.
| Parameter | Default | Description |
|---|---|---|
tuneCacheConnectionPoolMaxConnectionTotal | 2048 | Cache Connection Pool total maximum connection count |
API Call Connection Pool
Manages external API calls made by RestApi Policy and similar policies.
| Parameter | Default | Description |
|---|---|---|
tuneApiCallConnectionPoolMaxConnectionPerHost | 256 | Maximum connections per host |
tuneApiCallConnectionPoolMaxConnectionTotal | 4096 | Total maximum connection count |
Timeout Tuning
| Parameter | Default | Description |
|---|---|---|
tuneReadTimeout | 30000 ms (30 sec) | Client data read timeout. Connection is closed if client stops sending data |
tuneStreamingReadTimeout | 0 ms (unlimited) | Client-side read timeout for SSE/LLM streaming connections |
tuneNoRequestTimeout | 60000 ms (60 sec) | No-request timeout after connection is established |
Relationship Between Timeout Parameters
Client connection → tuneNoRequestTimeout (waiting for request)
→ Request received → tuneReadTimeout (reading data)
→ Streaming? → tuneStreamingReadTimeout
tuneNoRequestTimeout: Connection opened but no HTTP request sent — cleans up idle connectionstuneReadTimeout: Client stopped sending data during request processing — cleans up stuck requeststuneStreamingReadTimeout: For long-lived connections like SSE, Server-Sent Events, and LLM streaming, the normaltuneReadTimeoutis insufficient
SSE/LLM Streaming Scenarios:
In streaming connections, clients don't send data for extended periods, so the normal tuneReadTimeout may prematurely close the connection. tuneStreamingReadTimeout assigns a dedicated timeout value for streaming connections. The default value of 0 (unlimited) is suitable for most scenarios; however, you may set an appropriate upper limit for your environment to prevent resource leaks.
Benchmark Reference
For tier-based performance comparisons and detailed benchmark results, see Capacity Planning.
Benchmark results were measured under ideal conditions (fast backend, minimal network latency, simple policy chain). Always perform load tests based on your own traffic patterns for production environments.
Related Pages
Automatic memory profiles, GC options, and heap configuration
Hardware sizing, RAM calculation, and benchmark results
UI-based configuration and parameter reference for gateway environments
Monitoring thread usage over time and alert mechanisms