Summary
With 2,000 Concurrent Threads on a 1 CPU 8 Core system, 15,000 reqs/sec is achieved.
Test Environment
DigitalOcean platform was used as infrastructure for running load tests due to Apinizer platform’s ease of use and quick support.Load Test Topology
The load test topology is as follows:
Load Test Server
- CPU-Optimized
- Dedicated CPU
- 8 vCPUs (Intel Xeon Scalable, 2.5 GHz)
- 16 GB RAM
- 100 GB Disk
- CentOS 8.3 x64
Kubernetes Master & MongoDB
- CPU-Optimized
- Dedicated CPU
- 4 vCPUs (Intel Xeon Scalable, 2.5 GHz)
- 8 GB RAM
- 50 GB Disk
- CentOS 8.3 x64
Elasticsearch
- CPU-Optimized
- Dedicated CPU
- 8 vCPUs (Intel Xeon Scalable, 2.5 GHz)
- 16 GB RAM
- 100 GB Disk
- CentOS 8.3 x64
Kubernetes Worker
- CPU-Optimized
- Dedicated CPU
- 16 vCPUs (Intel Xeon Scalable, 2.5 GHz)
- 32 GB RAM
- 200 GB Disk
- CentOS 8.3 x64
Test Setup
1. Load Test Server Setup and JMeter Configuration
JMeter was used for load testing. Setup steps:Java Installation
Java Installation
JMeter Installation
JMeter Installation
Test Scenario Configuration
Test Scenario Configuration
- Thread count configured parametrically
- Test duration configured parametrically
- HTTP requests configured
- Result collection mechanism set up
2. NGINX Configuration
NGINX was used as a load balancer:NGINX Features
- Load balancing
- SSL termination
- Reverse proxy
- Health check
Configuration
- Upstream server definitions
- Proxy pass settings
- Timeout settings
- Connection pool settings
3. Apinizer and Log Server Setup
Kubernetes, MongoDB, and Elasticsearch installations were done according to Apinizer installation documentation.Kubernetes master, worker, MongoDB, Elasticsearch, and Apinizer installations were done following the installation steps at Apinizer Installation Documentation.
Important Points of Load Testing
Points to consider when testing:Asynchronous Logging
Apinizer stores all request & response messages and metrics asynchronously in the Elasticsearch log database. During tests, these logging operations continued as they should.
Network Latency
Internal IPs were used in all our tests to reduce network latency and see Apinizer’s real impact.
Pod Restart
We particularly observed that Kubernetes did not restart pods during runtime. Restart count is an important parameter as it reflects overload/congestion or error conditions.
JVM Monitoring
JVM performance metrics were monitored with JConsole. JMX service was exposed to the external world via Kubernetes.
Test Scenarios
Test scenarios were configured for different conditions:Condition A - Basic Configuration
Condition A - Basic Configuration
- Minimal configuration
- Basic API Proxy
- Test without policies
Condition B - Medium Level Configuration
Condition B - Medium Level Configuration
- Medium level configuration
- Basic policies
- Standard usage scenario
Condition C - Advanced Configuration
Condition C - Advanced Configuration
- Advanced configuration
- Multiple policy support
- High performance scenario
Condition D - Optimized Configuration
Condition D - Optimized Configuration
- Optimized configuration
- Performance optimizations
- Production-ready scenario
Load Test Results
GET Requests Results
| Condition | Thread Count | Throughput (reqs/sec) | Avg Response Time (ms) |
|---|---|---|---|
| A | 50 | 1,133 | 43 |
| 100 | 1,100 | 90 | |
| 250 | 1,025 | 242 | |
| 500 | 963 | 516 | |
| B | 50 | 2,232 | 22 |
| 100 | 2,169 | 45 | |
| 250 | 2,089 | 119 | |
| 500 | 1,915 | 259 | |
| 1,000 | 1,762 | 564 | |
| 1,500 | 1,631 | 915 | |
| 2,000 | 1,379 | 1,441 | |
| C | 50 | 8,090 | 6 |
| 100 | 7,816 | 12 | |
| 250 | 7,011 | 35 | |
| 500 | 6,759 | 73 | |
| 1,000 | 6,742 | 147 | |
| 1,500 | 6,683 | 223 | |
| 2,000 | 6,692 | 297 | |
| 4,000 | 6,448 | 617 | |
| D | 50 | 15,420 | 3 |
| 100 | 15,812 | 6 | |
| 250 | 15,614 | 15 | |
| 500 | 15,664 | 31 | |
| 1,000 | 15,454 | 64 | |
| 1,500 | 15,026 | 99 | |
| 2,000 | 14,839 | 133 | |
| 4,000 | 14,356 | 276 | |
| 8,000 | 11,603 | 655 |
POST 5KB Requests Results
| Condition | Thread Count | Throughput (reqs/sec) | Avg Response Time (ms) |
|---|---|---|---|
| A | 50 | 1,002 | 49 |
| 100 | 983 | 101 | |
| 250 | 852 | 292 | |
| B | 50 | 1,868 | 26 |
| 100 | 1,768 | 56 | |
| 250 | 1,456 | 170 | |
| 500 | 1,398 | 355 | |
| 1,000 | 1,229 | 809 | |
| 1,500 | 1,199 | 1,245 | |
| C | 50 | 7,353 | 6 |
| 100 | 7,257 | 13 | |
| 250 | 7,138 | 34 | |
| 500 | 7,141 | 69 | |
| 1,000 | 7,011 | 141 | |
| 1,500 | 6,935 | 215 | |
| D | 50 | 13,396 | 3 |
| 100 | 13,482 | 7 | |
| 250 | 13,587 | 18 | |
| 500 | 13,611 | 36 | |
| 1,000 | 13,562 | 73 | |
| 1,500 | 13,208 | 112 | |
| 2,000 | 13,179 | 150 | |
| 4,000 | 12,792 | 309 | |
| 8,000 | 11,115 | 701 |
POST 50KB Requests Results
| Condition | Thread Count | Throughput (reqs/sec) | Avg Response Time (ms) |
|---|---|---|---|
| A | 50 | 675 | 73 |
| 100 | 653 | 152 | |
| 250 | 554 | 448 | |
| B | 50 | 1,437 | 34 |
| 100 | 1,409 | 70 | |
| 250 | 1,223 | 203 | |
| 500 | 1,149 | 432 | |
| 1,000 | 877 | 1,134 | |
| C | 50 | 4,679 | 10 |
| 100 | 4,675 | 21 | |
| 250 | 4,020 | 61 | |
| 500 | 3,221 | 154 | |
| 1,000 | 2,962 | 335 | |
| D | 50 | 4,683 | 10 |
| 100 | 4,671 | 21 | |
| 250 | 4,382 | 56 | |
| 500 | 3,496 | 142 | |
| 1,000 | 3,046 | 326 | |
| 1,500 | 2,853 | 522 | |
| 2,000 | 2,794 | 710 |
Interpreting Results
Concurrent Users and Request Count
Throughput & Concurrent Users:

Request
An HTTP request made with a specific HTTP method to a specific target
Session
Zero or more requests can be made per session
Scaling
Vertical Scaling
When concurrent user count increases, throughput increases up to a certain limit. After that, it starts to decline. This natural course indicates that vertical growth has a limit.
Horizontal Scaling
To support more concurrent users with acceptable response times, horizontal or vertical scaling should be considered together. Since Kubernetes infrastructure is used in Apinizer, this operation can be configured very easily and quickly.
Message Size Impact
When message sizes increase, processing power increases, so throughput decreases. Therefore, response time also increases.GET vs POST 5KB
GET vs POST 5KB
Although request sizes are generally around 1KB average in real-life scenarios, we found it worthwhile to examine 5KB and 50KB POST requests since there was a very small difference between our 1KB POST and GET requests in our tests.
POST 5KB vs POST 50KB
POST 5KB vs POST 50KB
Although results naturally follow a lower value compared to GET requests, it was pleasing for us that numbers dropped to only one-fourth despite a 10-fold increase in data.
Memory Usage
RAM usage rates were very consistent throughout the load test. Even when request sizes increased tenfold, no significant increase in RAM usage was observed. This proved that OpenJ9 was the right choice.Policy Performance Impact
Each policy we add to the gateway affects performance according to its complexity and dependencies.Basic Authentication Policy Test
Tests were performed with “Basic Authentication” policy added (Condition D):| Thread Count | GET Throughput | GET Avg (ms) | GET with Policy Throughput | GET with Policy Avg (ms) |
|---|---|---|---|---|
| 50 | 15,420 | 3 | 14,760 | 3 |
| 100 | 15,812 | 6 | 14,843 | 6 |
| 250 | 15,614 | 15 | 14,891 | 16 |
| 500 | 15,664 | 31 | 14,748 | 33 |
| 1,000 | 15,454 | 64 | 14,285 | 68 |
| 1,500 | 15,026 | 99 | 14,373 | 102 |
| 2,000 | 14,839 | 133 | 14,280 | 136 |
| 4,000 | 14,356 | 276 | 13,795 | 279 |
| 8,000 | 11,603 | 655 | 11,437 | 672 |


As we can see, there was a performance impact, albeit imperceptible. However, if a computationally expensive policy like “content filtering” were added, or a policy requiring external connections like “LDAP Authentication” that also adds network latency were added, performance would drop even faster.
Policy Selection Best Practices
Policy Complexity
It is important to know how much load each policy will bring and choose the design accordingly.
External Connections
Policies requiring external connections like LDAP Authentication add network latency and affect performance.
Processing Power
Computationally expensive policies like content filtering increase CPU usage.
Policy Ordering
Policy ordering and conditional execution affect performance.
Performance Metrics
Throughput (Processing Speed)
Throughput shows the number of requests processed per second. With optimized configuration in Condition D:GET Requests
15,000+ reqs/sec (2,000 thread)
POST 5KB Requests
13,000+ reqs/sec (2,000 thread)
POST 50KB Requests
2,700+ reqs/sec (2,000 thread)
Response Time
Average response times:GET Requests
3-655 ms (depending on thread count)
POST 5KB Requests
3-701 ms
POST 50KB Requests
10-710 ms
Scalability
Vertical Scaling
Vertical Scaling
- Tested up to 8,000 threads on a single node
- Optimal performance observed at 2,000 threads
- Throughput starts to drop at higher thread counts
Horizontal Scaling
Horizontal Scaling
- Easy scaling with Kubernetes infrastructure
- Multiple gateway support with load balancer
- Automatic pod scaling
Results and Recommendations
Key Findings
High Performance
Throughput of 15,000+ reqs/sec achieved with optimized configuration.
Low Latency
Response times starting from 3ms for GET requests were observed.
Memory Efficiency
Even when message size increased tenfold, no significant increase in RAM usage was observed.
Policy Impact
Simple policies like Basic Authentication have minimal performance impact.
Recommendations
Production Environment
Production Environment
- Use Condition D configuration
- Perform horizontal scaling with Kubernetes
- Configure monitoring and alerting
Policy Management
Policy Management
- Evaluate performance impact of policies
- Remove unnecessary policies
- Optimize policy ordering
Capacity Planning
Capacity Planning
- Calculate expected load
- Optimize thread count
- Plan horizontal scaling

