Capacity Planning

Capacity planning is the process of determining the resources required for successful installation and operation of the Apinizer platform. This page is a comprehensive guide prepared for system administrators, network administrators, and infrastructure teams.

Capacity Planning Process

When performing capacity planning, the following factors should be considered:

Traffic Estimation

Traffic estimation forms the basis of capacity planning. The following points should be evaluated:

Expected API call count: Expected API call volume on daily and hourly basis should be determined
Peak traffic estimation: Traffic volume during highest traffic periods (e.g., campaign periods, special days) should be estimated
Traffic growth projection: Projection should be made for future traffic increase
Seasonal variations: Seasonal traffic variations throughout the year should be considered

Deployment Model

The deployment model where Apinizer will be installed directly affects resource requirements. Current options:

Topology 1: Test and PoC: Simple installation model where all components run on a single server
Topology 2: Professional Installation: Model where components are distributed across different servers
Topology 3: High Availability: Model where clustering is done for high availability
Topology 4: Inter-Regional Distribution: Global distribution, geographical backup

For detailed information, you can refer to the Deployment Models page.

High Availability

High availability requirements should be planned for critical business applications:

Uptime requirements: How long the system needs to run without interruption should be determined (generally %99.9+)
Failover strategy: How traffic will be routed when a component fails should be planned
Disaster recovery plan: How the system will be recovered in case of large-scale failures should be planned

Growth Plan

Capacity planning should cover not only current requirements but also future growth:

Short-term requirements: Resources required for 6-month period
Medium-term requirements: Resources planned for 1-2 year period
Long-term requirements: Resource requirements projected for 3 years and above

Tier-Based Capacity and Hardware Requirements

Traffic Categories

Tier	Traffic Level	Usage Purpose
Tier 1	< 500K requests/day	Test/POC environments
Tier 2	500K - 3M requests/day	Production environments
Tier 3	> 3M requests/day	Enterprise production (20M+ requests/day supported with 8 cores)

Worker Nodes (API Gateway)

Worker nodes are the main components that process API traffic. The table below contains performance data based on real benchmark tests.

Feature	Tier 1	Tier 2	Tier 3
CPU	2 core	4 core	8 core
RAM	2 GB	4 GB	8 GB
Node Count	1	2	4
IOPS	1,000	3,000	5,000+
IO Threads	2	4	8

RAM Requirement Calculation

Worker node RAM requirements should be calculated based on the following factors: Calculation Formula:

RAM (message) = (RAM Allocated for Operating System) + (Max. Thread Count × Thread Size) + ((Average Request Size + Average Response Size) × Concurrent Request Count × Buffer Factor (1.5-2.0))

1. RAM Requirement Based on Thread Count: Thread count directly affects RAM requirements. Stack space is allocated for each thread in JVM:

Maximum Thread Count: Max Thread Count value in connection pool settings
Thread Stack Size (-Xss): Stack space allocated for each thread via JVM parameter
- Default value: Usually 1 MB (may vary by platform and JVM version)
- Recommended value: -Xss1m (1 MB) or -Xss512k (512 KB) - depending on application requirements
Calculation: Maximum thread count × Thread Stack Size = Minimum thread stack RAM requirement

Example Calculation:

Maximum thread count: 4,096
Thread Stack Size (-Xss): 1 MB (default)
Thread Stack RAM requirement: 4,096 × 1 MB = ~4 GB (only for thread stacks)

JVM Parameter:

-Xss1m  # 1 MB stack space per thread
# or
-Xss512k  # 512 KB stack space per thread (less RAM usage)

2. RAM Requirement Based on Average Message Size: The size of request and response messages significantly affects RAM usage:

Small Messages (< 1 KB): Minimal RAM requirement
Medium Messages (1-10 KB): Medium level RAM requirement
Large Messages (> 10 KB): High RAM requirement
Very Large Messages (> 100 KB): Very high RAM requirement

3. RAM Requirement Based on File Processing Status: If APIs perform file upload/download operations, additional RAM requirement occurs:

No File Processing: No additional RAM requirement
Small Files (< 1 MB): Minimal additional RAM
Medium Files (1-10 MB): Medium level additional RAM
Large Files (> 10 MB): High additional RAM requirement

4. RAM Requirement Based on Instant Load Amount: Estimated instant load amount (peak traffic) affects RAM requirement:

Low Load (< 100 reqs/sec): Minimal RAM
Medium Load (100-1,000 reqs/sec): Medium level RAM
High Load (1,000-10,000 reqs/sec): High RAM
Very High Load (> 10,000 reqs/sec): Very high RAM

5. Operating System and Other Components: RAM should be allocated for operating system and other system components:

Operating System: Usually 1-2 GB
Kubernetes Components: ~500 MB for Kubelet, kube-proxy, etc.
Other System Services: ~500 MB

Total RAM Calculation Example: Tier 2 Scenario (4 Core / 4 GB RAM):

Thread RAM: 4,096 thread × 1 MB = 4 GB
Message RAM: (5 KB average) × 1,000 concurrent requests × 1.5 = ~7.5 MB (minimal)
File Processing: None = 0 GB
Operating System: ~2 GB
Total: ~6 GB (8 GB recommended for safety)

RAM Setting in Kubernetes YAML: In Kubernetes YAML files, request and limit values should be set the same. RAM usage ratio is determined by JVM parameter:

resources:
  requests:
    memory: "8Gi"  # Minimum requirement
  limits:
    memory: "8Gi"  # Maximum limit (same as request)

JVM MaxRAMPercentage Parameter: The MaxRAMPercentage parameter in JVM determines how much of the RAM allocated to the container will be used. The recommended value is 75%: Example:

Node RAM: 8 GB
Container RAM Limit: 8 GB
JVM MaxRAMPercentage: 75%
JVM RAM to Use: 8 GB × 0.75 = 6 GB
Operating System and Other: 8 GB × 0.25 = 2 GB

resources:
  requests:
    memory: "8Gi"
  limits:
    memory: "8Gi"

JVM Parameters:

-server -XX:MaxRAMPercentage=75.0

In this way, JVM uses 75% (6 GB) of the 8 GB RAM allocated to the container, and the remaining 25% (2 GB) is reserved for operating system and other system components.

Important: When Thread Stack Size (-Xss) parameter is not set, JVM uses default value (usually 1 MB). If high concurrent thread count is used, significant RAM requirement occurs for thread stacks. For example, ~4 GB RAM is required only for stack area for 4,096 threads. In production environments, you must calculate and test RAM requirements according to your own traffic patterns. If necessary, you can optimize RAM usage by reducing the -Xss parameter (e.g., 512 KB).

Benchmark Performance Results

Important: The performance data below are estimated values and may vary depending on many parameters. These values were measured under ideal conditions where the backend service responds quickly (network latency is minimal). In real production environments, performance may vary depending on the following factors:

Network latency: Network delay to backend service
Backend API performance: Response time and processing capacity of backend service
Policy complexity: Number and complexity of policies added to the gateway
Request body size: Large payloads require more processing power
Concurrent load pattern: Distribution of traffic and peak moments
System resources: CPU, RAM and disk I/O usage

Therefore, performance tests must be performed according to your own traffic patterns and infrastructure for production environments.

The performance data below were measured in a real test environment under concurrent thread loads. Tests were performed under ideal conditions where the backend service responds quickly (network latency is minimal). 2 Core / 2 GB RAM Configuration (Tier 1 - Test/POC):

GET Requests:
- 50 thread: ~2,200 reqs/sec (average 22ms)
- 100 thread: ~2,200 reqs/sec (average 45ms)
- 250 thread: ~2,100 reqs/sec (average 119ms)
POST Requests (5KB body):
- 50 thread: ~1,900 reqs/sec (average 26ms)
- 100 thread: ~1,800 reqs/sec (average 56ms)
- 250 thread: ~1,500 reqs/sec (average 170ms)

4 Core / 4 GB RAM Configuration (Tier 2 - Production):

GET Requests:
- 50 thread: ~8,000 reqs/sec (average 6ms)
- 500 thread: ~6,700 reqs/sec (average 73ms)
- 1,000 thread: ~6,700 reqs/sec (average 147ms)
POST Requests (5KB body):
- 50 thread: ~7,300 reqs/sec (average 6ms)
- 500 thread: ~7,100 reqs/sec (average 69ms)
- 1,000 thread: ~7,000 reqs/sec (average 141ms)

8 Core / 8 GB RAM Configuration (Tier 3 - Enterprise):

GET Requests:
- 50 thread: ~15,400 reqs/sec (average 3ms)
- 500 thread: ~15,600 reqs/sec (average 31ms)
- 1,000 thread: ~15,400 reqs/sec (average 64ms)
- 2,000 thread: ~14,800 reqs/sec (average 133ms)
- 4,000 thread: ~14,300 reqs/sec (average 276ms)
- 8,000 thread: ~11,600 reqs/sec (average 655ms)
POST Requests (5KB body):
- 50 thread: ~13,400 reqs/sec (average 3ms)
- 500 thread: ~13,600 reqs/sec (average 36ms)
- 1,000 thread: ~13,500 reqs/sec (average 73ms)
- 2,000 thread: ~13,200 reqs/sec (average 150ms)
- 4,000 thread: ~12,800 reqs/sec (average 309ms)
- 8,000 thread: ~11,100 reqs/sec (average 701ms)
POST Requests (50KB body):
- 50 thread: ~4,700 reqs/sec (average 10ms)
- 500 thread: ~3,500 reqs/sec (average 142ms)
- 1,000 thread: ~3,000 reqs/sec (average 326ms)
- 2,000 thread: ~2,800 reqs/sec (average 710ms)

Capacity Calculation Example: 20 million+ requests per day can be supported with 8 core / 8 GB RAM configuration. Example calculation:

20 million requests per day = Average ~231 reqs/sec
Peak traffic (60% of traffic in 20% of day): ~694 reqs/sec
8 core configuration has capacity of ~15,400 reqs/sec (GET) at 1,000 threads
Therefore, 20 million+ requests per day can be easily met with 8 cores

Note: These values are estimates and may vary depending on network latency, backend API performance, policy complexity, and other factors. Performance tests should be performed in your own environment for real capacity planning.Horizontal scaling (by adding additional worker nodes via Kubernetes) can be done for higher traffic requirements.

Important Notes:

Concurrent Thread vs Request: Concurrent thread count and instant request count are different concepts. A thread can make multiple requests within a certain time period. Since gateways generally work stateless, concurrent request count and latency measurement are more meaningful.
Performance Affecting Factors:
- Backend service response time
- Network delay
- Complexity and number of policies added to the gateway
- Request body size (large payloads require more processing power)
Scaling: Vertical scaling has a limit. To support more concurrent users with acceptable response times, horizontal scaling (can be easily done via Kubernetes) or vertical + horizontal combination should be considered.
Policy Impact: Each policy added to the gateway affects performance. While simple policies like “Basic Authentication” have minimal impact, policies requiring high processing power like “Content Filtering” or policies requiring external connections like “LDAP Authentication” have more performance impact.

Worker Node Configuration Settings

Worker node performance depends on the following configuration settings. These settings can be made from the API Gateway Environment Settings screen: Tier 1 (2 Core / 2 GB RAM) Recommended Settings:

IO Threads: 2
Routing Connection Pool - Min Thread Count: 512
Routing Connection Pool - Max Thread Count: 2,048
Routing Connection Pool - Max Connections Total: 2,048
Elasticsearch Client - IO Thread Count: 8

Tier 2 (4 Core / 4 GB RAM) Recommended Settings:

IO Threads: 4
Routing Connection Pool - Min Thread Count: 1,024
Routing Connection Pool - Max Thread Count: 4,096
Routing Connection Pool - Max Connections Total: 4,096
Elasticsearch Client - IO Thread Count: 16

Tier 3 (8 Core / 8 GB RAM) Recommended Settings:

IO Threads: 8
Routing Connection Pool - Min Thread Count: 1,024
Routing Connection Pool - Max Thread Count: 8,192
Routing Connection Pool - Max Connections Total: 8,192
Elasticsearch Client - IO Thread Count: 32

JVM Parameters:

-server -XX:MaxRAMPercentage=75.0

Attention: These performance data were measured under ideal conditions where the backend service responds quickly and network latency is minimal. In real production environments, backend service response time, network latency, and added policies can significantly affect performance. Performance tests must be performed according to your own traffic patterns for production environments.

Manager Nodes

Manager nodes provide the web interface where configurations are made. They do not depend on daily traffic load; they are used for configuration management and interface access. A single node is sufficient for standalone installation, HA installation with at least 2 nodes is recommended for production environments for high availability.

MongoDB Replica Set

MongoDB is Apinizer’s configuration database. It stores API proxy definitions, policy configurations, and metadata information. It does not depend on daily traffic load; requirements are determined based on API proxy count and configuration complexity.

Critical: MongoDB must be configured as a Replica Set. Even for a single node, it must be installed as a replica set. Standalone instance should not be used.

Elasticsearch Cluster

Elasticsearch stores log and analytics data.

Feature	Tier 1	Tier 2	Tier 3
CPU	4 core	8 core	16 core
RAM	16 GB	32 GB	64 GB
Disk (Hot)	500 GB SSD	2 TB SSD	5+ TB SSD
Disk (Warm)	1 TB HDD	3 TB HDD	10+ TB HDD
Network	1 Gbps	1-10 Gbps	10+ Gbps
Node Count	1	1(min)	3 (min)
Shard/Index	1	1(min)	3
Daily Log	< 10 GB	10-100 GB	> 100 GB
Monthly Log	~300 GB	~1.5 TB	~6 TB+
Annual Log (Retention)	~3.6 TB	~18 TB	~72 TB+

Replica Count and Data Size: The total data size to be stored in Elasticsearch varies depending on the replica count. Replica determines how many copies of the same data will be stored and directly affects disk requirement. Disk Requirement Calculation:

Total Disk Requirement = (Primary Shard Data Size) × (1 + Replica Count) × Buffer Factor (1.2-1.5)

Example Scenarios:

Replica = 0: Data is not replicated. Single copy is kept. Minimum disk requirement.
- Example: 1 TB primary data → ~1.2 TB disk requirement
Replica = 1: One more copy of data is kept. Recommended for high availability.
- Example: 1 TB primary data → ~2.4 TB disk requirement (1 TB primary + 1 TB replica)
Replica = 2: Two more copies of data are kept. For very high availability.
- Example: 1 TB primary data → ~3.6 TB disk requirement (1 TB primary + 2 TB replica)

Important Notes:

Production Environments: Minimum replica = 1 is recommended for high availability. In this case, disk requirement approximately doubles.
Test/PoC Environments: Replica = 0 can be used, disk requirement is lower.
Disk Planning: Both high availability and disk cost should be considered when determining replica count.

Disk Strategy: Using SSD for Hot tier and HDD for Warm/Cold tier is recommended. This provides cost optimization.

Cache Cluster (Hazelcast)

Cache Cluster is Apinizer’s distributed cache infrastructure and should be sized based on data size to be stored and processing to be done. Response cache, routing load balancer information, rate limit counters, and other performance-critical data are stored here. Cache size depends on cacheable data volume, access frequency, and replication settings rather than daily traffic count.

Factors Affecting Sizing

Cache Cluster sizing should be determined based on the following factors: 1. Data Type and Size to be Stored:

Response Cache: Size and count of response data to be cached
Routing/Load Balancer Information: Backend endpoint information and load balancing data
Rate Limit Counters: Counter data used for rate limit (usually small data, but accessed very frequently)
OAuth2 Tokens: Data volume for token caching
Other Performance Data: Quota, throttling, and other cached data

2. Access Pattern and Processing Intensity:

Rate Limit Usage: If rate limit is used, counters are read and written very frequently, so high thread count and high CPU requirement occurs. However, since cached data is small, RAM requirement may be relatively low.
Response Cache Usage: If large responses are cached, high RAM requirement occurs.
Concurrent Access: How many threads will access the cache at the same time affects CPU and thread requirements.

3. Replication Factor: Replication factor determines how many nodes the same data will be stored on and directly affects RAM requirement:

Replication Factor = 0: Data is not replicated. Each node keeps its own data. Total RAM requirement is lower.
Replication Factor = 1: One more copy of data is kept. RAM requirement increases for the same data volume.

RAM Calculation Formula:

Total RAM = (Cache Data Size) × (1 + Replication Factor) × Buffer Factor (1.2-1.5)

Example Scenarios: Small Cache (32 GB):

Cache Data Size: ~20 GB
Replication Factor: 1
Buffer Factor: 1.3
Total RAM: 20 GB × (1 + 1) × 1.3 = ~52 GB (~17 GB/node for 3 nodes)

Medium Cache (64-128 GB):

Cache Data Size: ~50-100 GB
Replication Factor: 1
Buffer Factor: 1.3
Total RAM: 100 GB × (1 + 1) × 1.3 = ~260 GB (~87 GB/node for 3 nodes)

Large Cache (256+ GB):

Cache Data Size: ~200 GB+
Replication Factor: 1
Buffer Factor: 1.3
Total RAM: 200 GB × (1 + 1) × 1.3 = ~520 GB (~104 GB/node for 5 nodes)

Feature	Small Cache	Medium Cache	Large Cache
Cache Size	32 GB	64-128 GB	256+ GB
CPU	4 core	8 core	16 core
RAM	32 GB	64-128 GB	256+ GB
Disk	100 GB SSD	200 GB SSD	500 GB SSD
Network	1 Gbps	1-10 Gbps	10 Gbps
Node Count	3 (min)	3-5	5+

Total Resource Calculation Example

The example below shows total resource requirements for a medium-scale production environment.

Scenario: Medium-Scale Production (Tier 2)

Worker Nodes:

3 node × (4 core + 4 GB RAM + 100 GB Disk) = 12 core + 12 GB RAM + 300 GB Disk

Manager Nodes:

1 node × (4 core + 8 GB RAM + 100 GB Disk) = 4 core + 8 GB RAM + 100 GB Disk

MongoDB Replica Set:

3 node × (4 core + 8 GB RAM + 200 GB Disk) = 12 core + 24 GB RAM + 600 GB Disk

Elasticsearch Cluster:

1 node × (8 core + 32 GB RAM + 2 TB Disk) = 8 core + 32 GB RAM + 2 TB Disk

Cache Cluster:

Varies based on cache size and processing intensity. Example: 1 node × (4 core + 8 GB RAM + 100 GB Disk) = 4 core + 8 GB RAM + 100 GB Disk
Note: Cache requirements vary based on data size to be stored, access frequency, and replication factor. In intensive access scenarios like rate limit, CPU may be high and RAM low. In large data scenarios like response cache, RAM may be high.

TOTAL:

CPU: 40 core (Cache: 12 core example value, varies by actual need)
RAM: 82 GB (Cache: 8 GB example value, varies by actual need)
Disk: ~3 TB