Skip to main content
Capacity planning is the process of determining the resources required for successful installation and operation of the Apinizer platform. This page is a comprehensive guide prepared for system administrators, network administrators, and infrastructure teams.

Capacity Planning Process

When performing capacity planning, the following factors should be considered:

Traffic Estimation

Traffic estimation forms the basis of capacity planning. The following points should be evaluated:
  • Expected API call count: Expected API call volume on daily and hourly basis should be determined
  • Peak traffic estimation: Traffic volume during highest traffic periods (e.g., campaign periods, special days) should be estimated
  • Traffic growth projection: Projection should be made for future traffic increase
  • Seasonal variations: Seasonal traffic variations throughout the year should be considered

Deployment Model

The deployment model where Apinizer will be installed directly affects resource requirements. Current options:
  • Topology 1: Test and PoC: Simple installation model where all components run on a single server
  • Topology 2: Professional Installation: Model where components are distributed across different servers
  • Topology 3: High Availability: Model where clustering is done for high availability
  • Topology 4: Inter-Regional Distribution: Global distribution, geographical backup
For detailed information, you can refer to the Deployment Models page.

High Availability

High availability requirements should be planned for critical business applications:
  • Uptime requirements: How long the system needs to run without interruption should be determined (generally %99.9+)
  • Failover strategy: How traffic will be routed when a component fails should be planned
  • Disaster recovery plan: How the system will be recovered in case of large-scale failures should be planned

Growth Plan

Capacity planning should cover not only current requirements but also future growth:
  • Short-term requirements: Resources required for 6-month period
  • Medium-term requirements: Resources planned for 1-2 year period
  • Long-term requirements: Resource requirements projected for 3 years and above

Tier-Based Capacity and Hardware Requirements

Traffic Categories

TierTraffic LevelUsage Purpose
Tier 1< 500K requests/dayTest/POC environments
Tier 2500K - 3M requests/dayProduction environments
Tier 3> 3M requests/dayEnterprise production (20M+ requests/day supported with 8 cores)

Worker Nodes (API Gateway)

Worker nodes are the main components that process API traffic. The table below contains performance data based on real benchmark tests.
FeatureTier 1Tier 2Tier 3
CPU2 core4 core8 core
RAM2 GB4 GB8 GB
Node Count124
IOPS1,0003,0005,000+
IO Threads248

RAM Requirement Calculation

Worker node RAM requirements should be calculated based on the following factors: Calculation Formula:
RAM (message) = (RAM Allocated for Operating System) + (Max. Thread Count × Thread Size) + ((Average Request Size + Average Response Size) × Concurrent Request Count × Buffer Factor (1.5-2.0))
1. RAM Requirement Based on Thread Count: Thread count directly affects RAM requirements. Stack space is allocated for each thread in JVM:
  • Maximum Thread Count: Max Thread Count value in connection pool settings
  • Thread Stack Size (-Xss): Stack space allocated for each thread via JVM parameter
    • Default value: Usually 1 MB (may vary by platform and JVM version)
    • Recommended value: -Xss1m (1 MB) or -Xss512k (512 KB) - depending on application requirements
  • Calculation: Maximum thread count × Thread Stack Size = Minimum thread stack RAM requirement
Example Calculation:
  • Maximum thread count: 4,096
  • Thread Stack Size (-Xss): 1 MB (default)
  • Thread Stack RAM requirement: 4,096 × 1 MB = ~4 GB (only for thread stacks)
JVM Parameter:
-Xss1m  # 1 MB stack space per thread
# or
-Xss512k  # 512 KB stack space per thread (less RAM usage)
2. RAM Requirement Based on Average Message Size: The size of request and response messages significantly affects RAM usage:
  • Small Messages (< 1 KB): Minimal RAM requirement
  • Medium Messages (1-10 KB): Medium level RAM requirement
  • Large Messages (> 10 KB): High RAM requirement
  • Very Large Messages (> 100 KB): Very high RAM requirement
3. RAM Requirement Based on File Processing Status: If APIs perform file upload/download operations, additional RAM requirement occurs:
  • No File Processing: No additional RAM requirement
  • Small Files (< 1 MB): Minimal additional RAM
  • Medium Files (1-10 MB): Medium level additional RAM
  • Large Files (> 10 MB): High additional RAM requirement
4. RAM Requirement Based on Instant Load Amount: Estimated instant load amount (peak traffic) affects RAM requirement:
  • Low Load (< 100 reqs/sec): Minimal RAM
  • Medium Load (100-1,000 reqs/sec): Medium level RAM
  • High Load (1,000-10,000 reqs/sec): High RAM
  • Very High Load (> 10,000 reqs/sec): Very high RAM
5. Operating System and Other Components: RAM should be allocated for operating system and other system components:
  • Operating System: Usually 1-2 GB
  • Kubernetes Components: ~500 MB for Kubelet, kube-proxy, etc.
  • Other System Services: ~500 MB
Total RAM Calculation Example: Tier 2 Scenario (4 Core / 4 GB RAM):
  • Thread RAM: 4,096 thread × 1 MB = 4 GB
  • Message RAM: (5 KB average) × 1,000 concurrent requests × 1.5 = ~7.5 MB (minimal)
  • File Processing: None = 0 GB
  • Operating System: ~2 GB
  • Total: ~6 GB (8 GB recommended for safety)
RAM Setting in Kubernetes YAML: In Kubernetes YAML files, request and limit values should be set the same. RAM usage ratio is determined by JVM parameter:
resources:
  requests:
    memory: "8Gi"  # Minimum requirement
  limits:
    memory: "8Gi"  # Maximum limit (same as request)
JVM MaxRAMPercentage Parameter: The MaxRAMPercentage parameter in JVM determines how much of the RAM allocated to the container will be used. The recommended value is 75%: Example:
  • Node RAM: 8 GB
  • Container RAM Limit: 8 GB
  • JVM MaxRAMPercentage: 75%
  • JVM RAM to Use: 8 GB × 0.75 = 6 GB
  • Operating System and Other: 8 GB × 0.25 = 2 GB
resources:
  requests:
    memory: "8Gi"
  limits:
    memory: "8Gi"
JVM Parameters:
-server -XX:MaxRAMPercentage=75.0
In this way, JVM uses 75% (6 GB) of the 8 GB RAM allocated to the container, and the remaining 25% (2 GB) is reserved for operating system and other system components.
Important: When Thread Stack Size (-Xss) parameter is not set, JVM uses default value (usually 1 MB). If high concurrent thread count is used, significant RAM requirement occurs for thread stacks. For example, ~4 GB RAM is required only for stack area for 4,096 threads. In production environments, you must calculate and test RAM requirements according to your own traffic patterns. If necessary, you can optimize RAM usage by reducing the -Xss parameter (e.g., 512 KB).

Benchmark Performance Results

Important: The performance data below are estimated values and may vary depending on many parameters. These values were measured under ideal conditions where the backend service responds quickly (network latency is minimal). In real production environments, performance may vary depending on the following factors:
  • Network latency: Network delay to backend service
  • Backend API performance: Response time and processing capacity of backend service
  • Policy complexity: Number and complexity of policies added to the gateway
  • Request body size: Large payloads require more processing power
  • Concurrent load pattern: Distribution of traffic and peak moments
  • System resources: CPU, RAM and disk I/O usage
Therefore, performance tests must be performed according to your own traffic patterns and infrastructure for production environments.
The performance data below were measured in a real test environment under concurrent thread loads. Tests were performed under ideal conditions where the backend service responds quickly (network latency is minimal). 2 Core / 2 GB RAM Configuration (Tier 1 - Test/POC):
  • GET Requests:
    • 50 thread: ~2,200 reqs/sec (average 22ms)
    • 100 thread: ~2,200 reqs/sec (average 45ms)
    • 250 thread: ~2,100 reqs/sec (average 119ms)
  • POST Requests (5KB body):
    • 50 thread: ~1,900 reqs/sec (average 26ms)
    • 100 thread: ~1,800 reqs/sec (average 56ms)
    • 250 thread: ~1,500 reqs/sec (average 170ms)
4 Core / 4 GB RAM Configuration (Tier 2 - Production):
  • GET Requests:
    • 50 thread: ~8,000 reqs/sec (average 6ms)
    • 500 thread: ~6,700 reqs/sec (average 73ms)
    • 1,000 thread: ~6,700 reqs/sec (average 147ms)
  • POST Requests (5KB body):
    • 50 thread: ~7,300 reqs/sec (average 6ms)
    • 500 thread: ~7,100 reqs/sec (average 69ms)
    • 1,000 thread: ~7,000 reqs/sec (average 141ms)
8 Core / 8 GB RAM Configuration (Tier 3 - Enterprise):
  • GET Requests:
    • 50 thread: ~15,400 reqs/sec (average 3ms)
    • 500 thread: ~15,600 reqs/sec (average 31ms)
    • 1,000 thread: ~15,400 reqs/sec (average 64ms)
    • 2,000 thread: ~14,800 reqs/sec (average 133ms)
    • 4,000 thread: ~14,300 reqs/sec (average 276ms)
    • 8,000 thread: ~11,600 reqs/sec (average 655ms)
  • POST Requests (5KB body):
    • 50 thread: ~13,400 reqs/sec (average 3ms)
    • 500 thread: ~13,600 reqs/sec (average 36ms)
    • 1,000 thread: ~13,500 reqs/sec (average 73ms)
    • 2,000 thread: ~13,200 reqs/sec (average 150ms)
    • 4,000 thread: ~12,800 reqs/sec (average 309ms)
    • 8,000 thread: ~11,100 reqs/sec (average 701ms)
  • POST Requests (50KB body):
    • 50 thread: ~4,700 reqs/sec (average 10ms)
    • 500 thread: ~3,500 reqs/sec (average 142ms)
    • 1,000 thread: ~3,000 reqs/sec (average 326ms)
    • 2,000 thread: ~2,800 reqs/sec (average 710ms)
Capacity Calculation Example: 20 million+ requests per day can be supported with 8 core / 8 GB RAM configuration. Example calculation:
  • 20 million requests per day = Average ~231 reqs/sec
  • Peak traffic (60% of traffic in 20% of day): ~694 reqs/sec
  • 8 core configuration has capacity of ~15,400 reqs/sec (GET) at 1,000 threads
  • Therefore, 20 million+ requests per day can be easily met with 8 cores
Note: These values are estimates and may vary depending on network latency, backend API performance, policy complexity, and other factors. Performance tests should be performed in your own environment for real capacity planning.Horizontal scaling (by adding additional worker nodes via Kubernetes) can be done for higher traffic requirements.
Important Notes:
  • Concurrent Thread vs Request: Concurrent thread count and instant request count are different concepts. A thread can make multiple requests within a certain time period. Since gateways generally work stateless, concurrent request count and latency measurement are more meaningful.
  • Performance Affecting Factors:
    • Backend service response time
    • Network delay
    • Complexity and number of policies added to the gateway
    • Request body size (large payloads require more processing power)
  • Scaling: Vertical scaling has a limit. To support more concurrent users with acceptable response times, horizontal scaling (can be easily done via Kubernetes) or vertical + horizontal combination should be considered.
  • Policy Impact: Each policy added to the gateway affects performance. While simple policies like “Basic Authentication” have minimal impact, policies requiring high processing power like “Content Filtering” or policies requiring external connections like “LDAP Authentication” have more performance impact.

Worker Node Configuration Settings

Worker node performance depends on the following configuration settings. These settings can be made from the API Gateway Environment Settings screen: Tier 1 (2 Core / 2 GB RAM) Recommended Settings:
  • IO Threads: 2
  • Routing Connection Pool - Min Thread Count: 512
  • Routing Connection Pool - Max Thread Count: 2,048
  • Routing Connection Pool - Max Connections Total: 2,048
  • Elasticsearch Client - IO Thread Count: 8
Tier 2 (4 Core / 4 GB RAM) Recommended Settings:
  • IO Threads: 4
  • Routing Connection Pool - Min Thread Count: 1,024
  • Routing Connection Pool - Max Thread Count: 4,096
  • Routing Connection Pool - Max Connections Total: 4,096
  • Elasticsearch Client - IO Thread Count: 16
Tier 3 (8 Core / 8 GB RAM) Recommended Settings:
  • IO Threads: 8
  • Routing Connection Pool - Min Thread Count: 1,024
  • Routing Connection Pool - Max Thread Count: 8,192
  • Routing Connection Pool - Max Connections Total: 8,192
  • Elasticsearch Client - IO Thread Count: 32
JVM Parameters:
  • -server -XX:MaxRAMPercentage=75.0
Attention: These performance data were measured under ideal conditions where the backend service responds quickly and network latency is minimal. In real production environments, backend service response time, network latency, and added policies can significantly affect performance. Performance tests must be performed according to your own traffic patterns for production environments.

Manager Nodes

Manager nodes provide the web interface where configurations are made. They do not depend on daily traffic load; they are used for configuration management and interface access. A single node is sufficient for standalone installation, HA installation with at least 2 nodes is recommended for production environments for high availability.

MongoDB Replica Set

MongoDB is Apinizer’s configuration database. It stores API proxy definitions, policy configurations, and metadata information. It does not depend on daily traffic load; requirements are determined based on API proxy count and configuration complexity.
Critical: MongoDB must be configured as a Replica Set. Even for a single node, it must be installed as a replica set. Standalone instance should not be used.

Elasticsearch Cluster

Elasticsearch stores log and analytics data.
FeatureTier 1Tier 2Tier 3
CPU4 core8 core16 core
RAM16 GB32 GB64 GB
Disk (Hot)500 GB SSD2 TB SSD5+ TB SSD
Disk (Warm)1 TB HDD3 TB HDD10+ TB HDD
Network1 Gbps1-10 Gbps10+ Gbps
Node Count11(min)3 (min)
Shard/Index11(min)3
Daily Log< 10 GB10-100 GB> 100 GB
Monthly Log~300 GB~1.5 TB~6 TB+
Annual Log (Retention)~3.6 TB~18 TB~72 TB+
Replica Count and Data Size: The total data size to be stored in Elasticsearch varies depending on the replica count. Replica determines how many copies of the same data will be stored and directly affects disk requirement. Disk Requirement Calculation:
Total Disk Requirement = (Primary Shard Data Size) × (1 + Replica Count) × Buffer Factor (1.2-1.5)
Example Scenarios:
  • Replica = 0: Data is not replicated. Single copy is kept. Minimum disk requirement.
    • Example: 1 TB primary data → ~1.2 TB disk requirement
  • Replica = 1: One more copy of data is kept. Recommended for high availability.
    • Example: 1 TB primary data → ~2.4 TB disk requirement (1 TB primary + 1 TB replica)
  • Replica = 2: Two more copies of data are kept. For very high availability.
    • Example: 1 TB primary data → ~3.6 TB disk requirement (1 TB primary + 2 TB replica)
Important Notes:
  • Production Environments: Minimum replica = 1 is recommended for high availability. In this case, disk requirement approximately doubles.
  • Test/PoC Environments: Replica = 0 can be used, disk requirement is lower.
  • Disk Planning: Both high availability and disk cost should be considered when determining replica count.
Disk Strategy: Using SSD for Hot tier and HDD for Warm/Cold tier is recommended. This provides cost optimization.

Cache Cluster (Hazelcast)

Cache Cluster is Apinizer’s distributed cache infrastructure and should be sized based on data size to be stored and processing to be done. Response cache, routing load balancer information, rate limit counters, and other performance-critical data are stored here. Cache size depends on cacheable data volume, access frequency, and replication settings rather than daily traffic count.

Factors Affecting Sizing

Cache Cluster sizing should be determined based on the following factors: 1. Data Type and Size to be Stored:
  • Response Cache: Size and count of response data to be cached
  • Routing/Load Balancer Information: Backend endpoint information and load balancing data
  • Rate Limit Counters: Counter data used for rate limit (usually small data, but accessed very frequently)
  • OAuth2 Tokens: Data volume for token caching
  • Other Performance Data: Quota, throttling, and other cached data
2. Access Pattern and Processing Intensity:
  • Rate Limit Usage: If rate limit is used, counters are read and written very frequently, so high thread count and high CPU requirement occurs. However, since cached data is small, RAM requirement may be relatively low.
  • Response Cache Usage: If large responses are cached, high RAM requirement occurs.
  • Concurrent Access: How many threads will access the cache at the same time affects CPU and thread requirements.
3. Replication Factor: Replication factor determines how many nodes the same data will be stored on and directly affects RAM requirement:
  • Replication Factor = 0: Data is not replicated. Each node keeps its own data. Total RAM requirement is lower.
  • Replication Factor = 1: One more copy of data is kept. RAM requirement increases for the same data volume.
RAM Calculation Formula:
Total RAM = (Cache Data Size) × (1 + Replication Factor) × Buffer Factor (1.2-1.5)
Example Scenarios: Small Cache (32 GB):
  • Cache Data Size: ~20 GB
  • Replication Factor: 1
  • Buffer Factor: 1.3
  • Total RAM: 20 GB × (1 + 1) × 1.3 = ~52 GB (~17 GB/node for 3 nodes)
Medium Cache (64-128 GB):
  • Cache Data Size: ~50-100 GB
  • Replication Factor: 1
  • Buffer Factor: 1.3
  • Total RAM: 100 GB × (1 + 1) × 1.3 = ~260 GB (~87 GB/node for 3 nodes)
Large Cache (256+ GB):
  • Cache Data Size: ~200 GB+
  • Replication Factor: 1
  • Buffer Factor: 1.3
  • Total RAM: 200 GB × (1 + 1) × 1.3 = ~520 GB (~104 GB/node for 5 nodes)
FeatureSmall CacheMedium CacheLarge Cache
Cache Size32 GB64-128 GB256+ GB
CPU4 core8 core16 core
RAM32 GB64-128 GB256+ GB
Disk100 GB SSD200 GB SSD500 GB SSD
Network1 Gbps1-10 Gbps10 Gbps
Node Count3 (min)3-55+

Total Resource Calculation Example

The example below shows total resource requirements for a medium-scale production environment.

Scenario: Medium-Scale Production (Tier 2)

Worker Nodes:
  • 3 node × (4 core + 4 GB RAM + 100 GB Disk) = 12 core + 12 GB RAM + 300 GB Disk
Manager Nodes:
  • 1 node × (4 core + 8 GB RAM + 100 GB Disk) = 4 core + 8 GB RAM + 100 GB Disk
MongoDB Replica Set:
  • 3 node × (4 core + 8 GB RAM + 200 GB Disk) = 12 core + 24 GB RAM + 600 GB Disk
Elasticsearch Cluster:
  • 1 node × (8 core + 32 GB RAM + 2 TB Disk) = 8 core + 32 GB RAM + 2 TB Disk
Cache Cluster:
  • Varies based on cache size and processing intensity. Example: 1 node × (4 core + 8 GB RAM + 100 GB Disk) = 4 core + 8 GB RAM + 100 GB Disk
  • Note: Cache requirements vary based on data size to be stored, access frequency, and replication factor. In intensive access scenarios like rate limit, CPU may be high and RAM low. In large data scenarios like response cache, RAM may be high.
TOTAL:
  • CPU: 40 core (Cache: 12 core example value, varies by actual need)
  • RAM: 82 GB (Cache: 8 GB example value, varies by actual need)
  • Disk: ~3 TB

Capacity Planning Checklist

  • Traffic level is determined (Tier 1/2/3)
  • Daily/hourly/peak traffic estimation is made
  • Traffic growth projection is prepared
  • Worker node requirements are calculated (CPU, RAM, Disk, Network, Node count)
  • Manager node requirements are determined (Standalone or HA)
  • MongoDB data size is estimated (API Proxy count, Audit Log)
  • Elasticsearch log data is estimated (daily/monthly/annual retention)
  • Cache cluster size is determined (based on cacheable data volume)
  • Network bandwidth requirements are calculated
  • Total resource calculation is made (CPU, RAM, Disk)
  • Node counts are determined (for HA and scaling)
  • Deployment model is selected (Topology 1/2/3/4)
Important: This guide is a starting point. Real requirements may vary depending on traffic patterns, API complexity, and policy count. Performance tests must be performed for production environments.