Capacity Planning Process
When performing capacity planning, the following factors should be considered:Traffic Estimation
Traffic estimation forms the basis of capacity planning. The following points should be evaluated:- Expected API call count: Expected API call volume on daily and hourly basis should be determined
- Peak traffic estimation: Traffic volume during highest traffic periods (e.g., campaign periods, special days) should be estimated
- Traffic growth projection: Projection should be made for future traffic increase
- Seasonal variations: Seasonal traffic variations throughout the year should be considered
Deployment Model
The deployment model where Apinizer will be installed directly affects resource requirements. Current options:- Topology 1: Test and PoC: Simple installation model where all components run on a single server
- Topology 2: Professional Installation: Model where components are distributed across different servers
- Topology 3: High Availability: Model where clustering is done for high availability
- Topology 4: Inter-Regional Distribution: Global distribution, geographical backup
High Availability
High availability requirements should be planned for critical business applications:- Uptime requirements: How long the system needs to run without interruption should be determined (generally %99.9+)
- Failover strategy: How traffic will be routed when a component fails should be planned
- Disaster recovery plan: How the system will be recovered in case of large-scale failures should be planned
Growth Plan
Capacity planning should cover not only current requirements but also future growth:- Short-term requirements: Resources required for 6-month period
- Medium-term requirements: Resources planned for 1-2 year period
- Long-term requirements: Resource requirements projected for 3 years and above
Tier-Based Capacity and Hardware Requirements
Traffic Categories
| Tier | Traffic Level | Usage Purpose |
|---|---|---|
| Tier 1 | < 500K requests/day | Test/POC environments |
| Tier 2 | 500K - 3M requests/day | Production environments |
| Tier 3 | > 3M requests/day | Enterprise production (20M+ requests/day supported with 8 cores) |
Worker Nodes (API Gateway)
Worker nodes are the main components that process API traffic. The table below contains performance data based on real benchmark tests.| Feature | Tier 1 | Tier 2 | Tier 3 |
|---|---|---|---|
| CPU | 2 core | 4 core | 8 core |
| RAM | 2 GB | 4 GB | 8 GB |
| Node Count | 1 | 2 | 4 |
| IOPS | 1,000 | 3,000 | 5,000+ |
| IO Threads | 2 | 4 | 8 |
RAM Requirement Calculation
Worker node RAM requirements should be calculated based on the following factors: Calculation Formula:- Maximum Thread Count:
Max Thread Countvalue in connection pool settings - Thread Stack Size (-Xss): Stack space allocated for each thread via JVM parameter
- Default value: Usually 1 MB (may vary by platform and JVM version)
- Recommended value:
-Xss1m(1 MB) or-Xss512k(512 KB) - depending on application requirements
- Calculation: Maximum thread count × Thread Stack Size = Minimum thread stack RAM requirement
- Maximum thread count: 4,096
- Thread Stack Size (-Xss): 1 MB (default)
- Thread Stack RAM requirement: 4,096 × 1 MB = ~4 GB (only for thread stacks)
- Small Messages (< 1 KB): Minimal RAM requirement
- Medium Messages (1-10 KB): Medium level RAM requirement
- Large Messages (> 10 KB): High RAM requirement
- Very Large Messages (> 100 KB): Very high RAM requirement
- No File Processing: No additional RAM requirement
- Small Files (< 1 MB): Minimal additional RAM
- Medium Files (1-10 MB): Medium level additional RAM
- Large Files (> 10 MB): High additional RAM requirement
- Low Load (< 100 reqs/sec): Minimal RAM
- Medium Load (100-1,000 reqs/sec): Medium level RAM
- High Load (1,000-10,000 reqs/sec): High RAM
- Very High Load (> 10,000 reqs/sec): Very high RAM
- Operating System: Usually 1-2 GB
- Kubernetes Components: ~500 MB for Kubelet, kube-proxy, etc.
- Other System Services: ~500 MB
- Thread RAM: 4,096 thread × 1 MB = 4 GB
- Message RAM: (5 KB average) × 1,000 concurrent requests × 1.5 = ~7.5 MB (minimal)
- File Processing: None = 0 GB
- Operating System: ~2 GB
- Total: ~6 GB (8 GB recommended for safety)
MaxRAMPercentage parameter in JVM determines how much of the RAM allocated to the container will be used. The recommended value is 75%:
Example:
- Node RAM: 8 GB
- Container RAM Limit: 8 GB
- JVM MaxRAMPercentage: 75%
- JVM RAM to Use: 8 GB × 0.75 = 6 GB
- Operating System and Other: 8 GB × 0.25 = 2 GB
Benchmark Performance Results
The performance data below were measured in a real test environment under concurrent thread loads. Tests were performed under ideal conditions where the backend service responds quickly (network latency is minimal). 2 Core / 2 GB RAM Configuration (Tier 1 - Test/POC):- GET Requests:
- 50 thread: ~2,200 reqs/sec (average 22ms)
- 100 thread: ~2,200 reqs/sec (average 45ms)
- 250 thread: ~2,100 reqs/sec (average 119ms)
- POST Requests (5KB body):
- 50 thread: ~1,900 reqs/sec (average 26ms)
- 100 thread: ~1,800 reqs/sec (average 56ms)
- 250 thread: ~1,500 reqs/sec (average 170ms)
- GET Requests:
- 50 thread: ~8,000 reqs/sec (average 6ms)
- 500 thread: ~6,700 reqs/sec (average 73ms)
- 1,000 thread: ~6,700 reqs/sec (average 147ms)
- POST Requests (5KB body):
- 50 thread: ~7,300 reqs/sec (average 6ms)
- 500 thread: ~7,100 reqs/sec (average 69ms)
- 1,000 thread: ~7,000 reqs/sec (average 141ms)
- GET Requests:
- 50 thread: ~15,400 reqs/sec (average 3ms)
- 500 thread: ~15,600 reqs/sec (average 31ms)
- 1,000 thread: ~15,400 reqs/sec (average 64ms)
- 2,000 thread: ~14,800 reqs/sec (average 133ms)
- 4,000 thread: ~14,300 reqs/sec (average 276ms)
- 8,000 thread: ~11,600 reqs/sec (average 655ms)
- POST Requests (5KB body):
- 50 thread: ~13,400 reqs/sec (average 3ms)
- 500 thread: ~13,600 reqs/sec (average 36ms)
- 1,000 thread: ~13,500 reqs/sec (average 73ms)
- 2,000 thread: ~13,200 reqs/sec (average 150ms)
- 4,000 thread: ~12,800 reqs/sec (average 309ms)
- 8,000 thread: ~11,100 reqs/sec (average 701ms)
- POST Requests (50KB body):
- 50 thread: ~4,700 reqs/sec (average 10ms)
- 500 thread: ~3,500 reqs/sec (average 142ms)
- 1,000 thread: ~3,000 reqs/sec (average 326ms)
- 2,000 thread: ~2,800 reqs/sec (average 710ms)
Capacity Calculation Example: 20 million+ requests per day can be supported with 8 core / 8 GB RAM configuration. Example calculation:
- 20 million requests per day = Average ~231 reqs/sec
- Peak traffic (60% of traffic in 20% of day): ~694 reqs/sec
- 8 core configuration has capacity of ~15,400 reqs/sec (GET) at 1,000 threads
- Therefore, 20 million+ requests per day can be easily met with 8 cores
Important Notes:
- Concurrent Thread vs Request: Concurrent thread count and instant request count are different concepts. A thread can make multiple requests within a certain time period. Since gateways generally work stateless, concurrent request count and latency measurement are more meaningful.
-
Performance Affecting Factors:
- Backend service response time
- Network delay
- Complexity and number of policies added to the gateway
- Request body size (large payloads require more processing power)
- Scaling: Vertical scaling has a limit. To support more concurrent users with acceptable response times, horizontal scaling (can be easily done via Kubernetes) or vertical + horizontal combination should be considered.
- Policy Impact: Each policy added to the gateway affects performance. While simple policies like “Basic Authentication” have minimal impact, policies requiring high processing power like “Content Filtering” or policies requiring external connections like “LDAP Authentication” have more performance impact.
Worker Node Configuration Settings
Worker node performance depends on the following configuration settings. These settings can be made from the API Gateway Environment Settings screen: Tier 1 (2 Core / 2 GB RAM) Recommended Settings:- IO Threads: 2
- Routing Connection Pool - Min Thread Count: 512
- Routing Connection Pool - Max Thread Count: 2,048
- Routing Connection Pool - Max Connections Total: 2,048
- Elasticsearch Client - IO Thread Count: 8
- IO Threads: 4
- Routing Connection Pool - Min Thread Count: 1,024
- Routing Connection Pool - Max Thread Count: 4,096
- Routing Connection Pool - Max Connections Total: 4,096
- Elasticsearch Client - IO Thread Count: 16
- IO Threads: 8
- Routing Connection Pool - Min Thread Count: 1,024
- Routing Connection Pool - Max Thread Count: 8,192
- Routing Connection Pool - Max Connections Total: 8,192
- Elasticsearch Client - IO Thread Count: 32
-server -XX:MaxRAMPercentage=75.0
Manager Nodes
Manager nodes provide the web interface where configurations are made. They do not depend on daily traffic load; they are used for configuration management and interface access. A single node is sufficient for standalone installation, HA installation with at least 2 nodes is recommended for production environments for high availability.MongoDB Replica Set
MongoDB is Apinizer’s configuration database. It stores API proxy definitions, policy configurations, and metadata information. It does not depend on daily traffic load; requirements are determined based on API proxy count and configuration complexity.Elasticsearch Cluster
Elasticsearch stores log and analytics data.| Feature | Tier 1 | Tier 2 | Tier 3 |
|---|---|---|---|
| CPU | 4 core | 8 core | 16 core |
| RAM | 16 GB | 32 GB | 64 GB |
| Disk (Hot) | 500 GB SSD | 2 TB SSD | 5+ TB SSD |
| Disk (Warm) | 1 TB HDD | 3 TB HDD | 10+ TB HDD |
| Network | 1 Gbps | 1-10 Gbps | 10+ Gbps |
| Node Count | 1 | 1(min) | 3 (min) |
| Shard/Index | 1 | 1(min) | 3 |
| Daily Log | < 10 GB | 10-100 GB | > 100 GB |
| Monthly Log | ~300 GB | ~1.5 TB | ~6 TB+ |
| Annual Log (Retention) | ~3.6 TB | ~18 TB | ~72 TB+ |
-
Replica = 0: Data is not replicated. Single copy is kept. Minimum disk requirement.
- Example: 1 TB primary data → ~1.2 TB disk requirement
-
Replica = 1: One more copy of data is kept. Recommended for high availability.
- Example: 1 TB primary data → ~2.4 TB disk requirement (1 TB primary + 1 TB replica)
-
Replica = 2: Two more copies of data are kept. For very high availability.
- Example: 1 TB primary data → ~3.6 TB disk requirement (1 TB primary + 2 TB replica)
- Production Environments: Minimum replica = 1 is recommended for high availability. In this case, disk requirement approximately doubles.
- Test/PoC Environments: Replica = 0 can be used, disk requirement is lower.
- Disk Planning: Both high availability and disk cost should be considered when determining replica count.
Disk Strategy: Using SSD for Hot tier and HDD for Warm/Cold tier is recommended. This provides cost optimization.
Cache Cluster (Hazelcast)
Cache Cluster is Apinizer’s distributed cache infrastructure and should be sized based on data size to be stored and processing to be done. Response cache, routing load balancer information, rate limit counters, and other performance-critical data are stored here. Cache size depends on cacheable data volume, access frequency, and replication settings rather than daily traffic count.Factors Affecting Sizing
Cache Cluster sizing should be determined based on the following factors: 1. Data Type and Size to be Stored:- Response Cache: Size and count of response data to be cached
- Routing/Load Balancer Information: Backend endpoint information and load balancing data
- Rate Limit Counters: Counter data used for rate limit (usually small data, but accessed very frequently)
- OAuth2 Tokens: Data volume for token caching
- Other Performance Data: Quota, throttling, and other cached data
- Rate Limit Usage: If rate limit is used, counters are read and written very frequently, so high thread count and high CPU requirement occurs. However, since cached data is small, RAM requirement may be relatively low.
- Response Cache Usage: If large responses are cached, high RAM requirement occurs.
- Concurrent Access: How many threads will access the cache at the same time affects CPU and thread requirements.
- Replication Factor = 0: Data is not replicated. Each node keeps its own data. Total RAM requirement is lower.
- Replication Factor = 1: One more copy of data is kept. RAM requirement increases for the same data volume.
- Cache Data Size: ~20 GB
- Replication Factor: 1
- Buffer Factor: 1.3
- Total RAM: 20 GB × (1 + 1) × 1.3 = ~52 GB (~17 GB/node for 3 nodes)
- Cache Data Size: ~50-100 GB
- Replication Factor: 1
- Buffer Factor: 1.3
- Total RAM: 100 GB × (1 + 1) × 1.3 = ~260 GB (~87 GB/node for 3 nodes)
- Cache Data Size: ~200 GB+
- Replication Factor: 1
- Buffer Factor: 1.3
- Total RAM: 200 GB × (1 + 1) × 1.3 = ~520 GB (~104 GB/node for 5 nodes)
| Feature | Small Cache | Medium Cache | Large Cache |
|---|---|---|---|
| Cache Size | 32 GB | 64-128 GB | 256+ GB |
| CPU | 4 core | 8 core | 16 core |
| RAM | 32 GB | 64-128 GB | 256+ GB |
| Disk | 100 GB SSD | 200 GB SSD | 500 GB SSD |
| Network | 1 Gbps | 1-10 Gbps | 10 Gbps |
| Node Count | 3 (min) | 3-5 | 5+ |
Total Resource Calculation Example
The example below shows total resource requirements for a medium-scale production environment.
Scenario: Medium-Scale Production (Tier 2)
Worker Nodes:- 3 node × (4 core + 4 GB RAM + 100 GB Disk) = 12 core + 12 GB RAM + 300 GB Disk
- 1 node × (4 core + 8 GB RAM + 100 GB Disk) = 4 core + 8 GB RAM + 100 GB Disk
- 3 node × (4 core + 8 GB RAM + 200 GB Disk) = 12 core + 24 GB RAM + 600 GB Disk
- 1 node × (8 core + 32 GB RAM + 2 TB Disk) = 8 core + 32 GB RAM + 2 TB Disk
- Varies based on cache size and processing intensity. Example: 1 node × (4 core + 8 GB RAM + 100 GB Disk) = 4 core + 8 GB RAM + 100 GB Disk
- Note: Cache requirements vary based on data size to be stored, access frequency, and replication factor. In intensive access scenarios like rate limit, CPU may be high and RAM low. In large data scenarios like response cache, RAM may be high.
- CPU: 40 core (Cache: 12 core example value, varies by actual need)
- RAM: 82 GB (Cache: 8 GB example value, varies by actual need)
- Disk: ~3 TB
Capacity Planning Checklist
- Traffic level is determined (Tier 1/2/3)
- Daily/hourly/peak traffic estimation is made
- Traffic growth projection is prepared
- Worker node requirements are calculated (CPU, RAM, Disk, Network, Node count)
- Manager node requirements are determined (Standalone or HA)
- MongoDB data size is estimated (API Proxy count, Audit Log)
- Elasticsearch log data is estimated (daily/monthly/annual retention)
- Cache cluster size is determined (based on cacheable data volume)
- Network bandwidth requirements are calculated
- Total resource calculation is made (CPU, RAM, Disk)
- Node counts are determined (for HA and scaling)
- Deployment model is selected (Topology 1/2/3/4)
Important: This guide is a starting point. Real requirements may vary depending on traffic patterns, API complexity, and policy count. Performance tests must be performed for production environments.

