Skip to main content

Overview

Gateway performance depends on several factors:
  • Hardware resources: CPU core count, memory amount
  • Traffic profile: Concurrent request count, request size, SSE/streaming usage
  • Policy complexity: Number and weight of applied policies (e.g., CPU-intensive policies like Content Filtering)
  • Backend response time: Latency profile of the target service
Tuning approach — priority order:
  1. Automatic profile — Sufficient for most environments, no intervention needed
  2. Tier recommendations — Ready-made parameter sets based on resource size (this page)
  3. Manual tuning — Fine-tuning based on load test results
The following tables show recommended Gateway Worker settings for different hardware profiles. These values should be validated with pre-production load tests.

Tier 1 — 2 Core / 2 GB RAM

Suitable for low-traffic environments, development, and PoC scenarios.
ParameterRecommended Value
tuneIoThreads2
tuneWorkerThreads512
tuneWorkerMaxThreads1024
tuneRoutingConnectionPoolMinThreadCount512
tuneRoutingConnectionPoolMaxThreadCount2048
tuneRoutingConnectionPoolMaxConnectionTotal2048
tuneElasticsearchClientIoThreadCount8
tuneAsyncExecutorCorePoolSize512
tuneAsyncExecutorMaxPoolSize1024
tuneAsyncExecutorQueueCapacity1000
Tier 1 Environment Variable Block Example:
tuneIoThreads=2
tuneWorkerThreads=512
tuneWorkerMaxThreads=1024
tuneRoutingConnectionPoolMinThreadCount=512
tuneRoutingConnectionPoolMaxThreadCount=2048
tuneRoutingConnectionPoolMaxConnectionTotal=2048
tuneElasticsearchClientIoThreadCount=8

Tier 2 — 4 Core / 4 GB RAM

Recommended for medium-scale production environments. A balanced starting point for most deployments.
ParameterRecommended Value
tuneIoThreads4
tuneWorkerThreads1024
tuneWorkerMaxThreads2048
tuneRoutingConnectionPoolMinThreadCount1024
tuneRoutingConnectionPoolMaxThreadCount4096
tuneRoutingConnectionPoolMaxConnectionTotal4096
tuneElasticsearchClientIoThreadCount16
tuneAsyncExecutorCorePoolSize1024
tuneAsyncExecutorMaxPoolSize2048
tuneAsyncExecutorQueueCapacity1000
Tier 2 Environment Variable Block Example:
tuneIoThreads=4
tuneWorkerThreads=1024
tuneWorkerMaxThreads=2048
tuneRoutingConnectionPoolMinThreadCount=1024
tuneRoutingConnectionPoolMaxThreadCount=4096
tuneRoutingConnectionPoolMaxConnectionTotal=4096
tuneElasticsearchClientIoThreadCount=16

Tier 3 — 8 Core / 8 GB RAM

Recommended for high-traffic production environments.
ParameterRecommended Value
tuneIoThreads8
tuneWorkerThreads2048
tuneWorkerMaxThreads4096
tuneRoutingConnectionPoolMinThreadCount1024
tuneRoutingConnectionPoolMaxThreadCount8192
tuneRoutingConnectionPoolMaxConnectionTotal8192
tuneElasticsearchClientIoThreadCount32
tuneAsyncExecutorCorePoolSize2048
tuneAsyncExecutorMaxPoolSize4096
tuneAsyncExecutorQueueCapacity2000
Tier 3 Environment Variable Block Example:
tuneIoThreads=8
tuneWorkerThreads=2048
tuneWorkerMaxThreads=4096
tuneRoutingConnectionPoolMinThreadCount=1024
tuneRoutingConnectionPoolMaxThreadCount=8192
tuneRoutingConnectionPoolMaxConnectionTotal=8192
tuneElasticsearchClientIoThreadCount=32
JVM Parameters: Automatic memory profile system is recommended for all tiers (no need to add JAVA_OPTS). For profile details and manual GC configuration, see JVM Garbage Collector Tuning.

Thread Tuning

Worker Threads

Worker threads are the core thread pool that processes incoming HTTP requests.
ParameterDefaultDescription
tuneWorkerThreads1024Minimum Worker thread count — threads created at server startup
tuneWorkerMaxThreads2048Maximum Worker thread count — upper limit the pool can grow to under load
CPU CorestuneWorkerThreadstuneWorkerMaxThreads
15121024
210242048
420484096
840968192
General rule: tuneWorkerThreads ≈ CPU × 512, tuneWorkerMaxThreads ≈ CPU × 1024. These values are based on Undertow’s (the HTTP server used by Gateway) thread management model.

IO Threads

IO threads are low-level threads that handle network I/O operations (socket read/write).
ParameterDefaultDescription
tuneIoThreadsCPU core countIO Thread count
IO thread count is typically kept equal to the CPU core count. Increasing it provides no benefit in most scenarios; it may cause context switching overhead.

Async Executor Thread Pool

Asynchronous operations such as RestApi Policy, Script Policy, logging, and traffic mirroring use this separate thread pool.
ParameterDefaultDescription
tuneAsyncExecutorCorePoolSizeSame as tuneWorkerThreadsMinimum number of threads kept alive in the pool
tuneAsyncExecutorMaxPoolSizeSame as tuneWorkerMaxThreadsMaximum number of threads that can be created
tuneAsyncExecutorQueueCapacitytuneMaxQueueSize if > 0, otherwise 1000Maximum number of tasks that can wait in the queue when threads are full
Thread Pool Sizing Warning:The async executor thread pool is independent from the main worker thread pool. Ensure that the total thread count (worker + async executor) does not exceed your system’s capacity. Consider CPU core count and available memory when determining thread counts.

Connection Pool Tuning

Routing Connection Pool

Manages HTTP connections to backend services. These values are critical for high concurrent request volumes.
ParameterDefaultDescription
tuneRoutingConnectionPoolMinThreadCountMinimum connection thread count
tuneRoutingConnectionPoolMaxThreadCountMaximum connection thread count
tuneRoutingConnectionPoolMaxConnectionPerHost1024Maximum connections to a single backend host
tuneRoutingConnectionPoolMaxConnectionTotal2048Total maximum connections to all backend hosts
When to increase:
  • When 503 errors or connection timeout logs are observed
  • When backend services respond slowly and connections are exhausted
  • When concurrent request count exceeds current pool limits

Cache Connection Pool

Manages connections to cache services.
ParameterDefaultDescription
tuneCacheConnectionPoolMaxConnectionTotal2048Cache Connection Pool total maximum connection count

API Call Connection Pool

Manages external API calls made by RestApi Policy and similar policies.
ParameterDefaultDescription
tuneApiCallConnectionPoolMaxConnectionPerHost256Maximum connections per host
tuneApiCallConnectionPoolMaxConnectionTotal4096Total maximum connection count

Timeout Tuning

ParameterDefaultDescription
tuneReadTimeout30000 ms (30 sec)Client data read timeout. Connection is closed if client stops sending data
tuneStreamingReadTimeout0 ms (unlimited)Client-side read timeout for SSE/LLM streaming connections
tuneNoRequestTimeout60000 ms (60 sec)No-request timeout after connection is established

Relationship Between Timeout Parameters

Client connection → tuneNoRequestTimeout (waiting for request)
                  → Request received → tuneReadTimeout (reading data)
                                      → Streaming? → tuneStreamingReadTimeout
  • tuneNoRequestTimeout: Connection opened but no HTTP request sent — cleans up idle connections
  • tuneReadTimeout: Client stopped sending data during request processing — cleans up stuck requests
  • tuneStreamingReadTimeout: For long-lived connections like SSE, Server-Sent Events, and LLM streaming, the normal tuneReadTimeout is insufficient
SSE/LLM Streaming Scenarios:In streaming connections, clients don’t send data for extended periods, so the normal tuneReadTimeout may prematurely close the connection. tuneStreamingReadTimeout assigns a dedicated timeout value for streaming connections. The default value of 0 (unlimited) is suitable for most scenarios; however, you may set an appropriate upper limit for your environment to prevent resource leaks.

Benchmark Reference

For tier-based performance comparisons and detailed benchmark results, see Capacity Planning.
Benchmark results were measured under ideal conditions (fast backend, minimal network latency, simple policy chain). Always perform load tests based on your own traffic patterns for production environments.

JVM Garbage Collector Tuning

Automatic memory profiles, GC options, and heap configuration

Capacity Planning

Hardware sizing, RAM calculation, and benchmark results

Gateway Runtimes

UI-based configuration and parameter reference for gateway environments

Pod Thread Count Periodic Monitoring

Monitoring thread usage over time and alert mechanisms