Skip to main content

Summary

With 2,000 Concurrent Threads on a 1 CPU 8 Core system, 15,000 reqs/sec is achieved.
Since these results vary depending on the response time of the called service, network latency, and system requirements of policies added to the gateway, you can review the details of our load test in the section below.

Test Environment

DigitalOcean platform was used as infrastructure for running load tests due to Apinizer platform’s ease of use and quick support.

Load Test Topology

The load test topology is as follows: Load Test Topology

Load Test Server

  • CPU-Optimized
  • Dedicated CPU
  • 8 vCPUs (Intel Xeon Scalable, 2.5 GHz)
  • 16 GB RAM
  • 100 GB Disk
  • CentOS 8.3 x64

Kubernetes Master & MongoDB

  • CPU-Optimized
  • Dedicated CPU
  • 4 vCPUs (Intel Xeon Scalable, 2.5 GHz)
  • 8 GB RAM
  • 50 GB Disk
  • CentOS 8.3 x64

Elasticsearch

  • CPU-Optimized
  • Dedicated CPU
  • 8 vCPUs (Intel Xeon Scalable, 2.5 GHz)
  • 16 GB RAM
  • 100 GB Disk
  • CentOS 8.3 x64

Kubernetes Worker

  • CPU-Optimized
  • Dedicated CPU
  • 16 vCPUs (Intel Xeon Scalable, 2.5 GHz)
  • 32 GB RAM
  • 200 GB Disk
  • CentOS 8.3 x64

Test Setup

1. Load Test Server Setup and JMeter Configuration

JMeter was used for load testing. Setup steps:
yum install java-1.8.0-openjdk -y
java -version
Installation Verification:
openjdk version "1.8.0_275"
OpenJDK Runtime Environment (build 1.8.0_275-b01)
OpenJDK 64-Bit Server VM (build 25.275-b01, mixed mode)
yum install wget -y
wget http://apache.stu.edu.tw//jmeter/binaries/apache-jmeter-5.2.1.tgz
tar -xf apache-jmeter-5.2.1.tgz
Environment Variables:
export JMETER_HOME=/root/apache-jmeter-5.2.1
export PATH=$JMETER_HOME/bin:$PATH
source ~/.bashrc
  • Thread count configured parametrically
  • Test duration configured parametrically
  • HTTP requests configured
  • Result collection mechanism set up

2. NGINX Configuration

NGINX was used as a load balancer:

NGINX Features

  • Load balancing
  • SSL termination
  • Reverse proxy
  • Health check

Configuration

  • Upstream server definitions
  • Proxy pass settings
  • Timeout settings
  • Connection pool settings

3. Apinizer and Log Server Setup

Kubernetes, MongoDB, and Elasticsearch installations were done according to Apinizer installation documentation.
Kubernetes master, worker, MongoDB, Elasticsearch, and Apinizer installations were done following the installation steps at Apinizer Installation Documentation.

Important Points of Load Testing

Points to consider when testing:

Asynchronous Logging

Apinizer stores all request & response messages and metrics asynchronously in the Elasticsearch log database. During tests, these logging operations continued as they should.

Network Latency

Internal IPs were used in all our tests to reduce network latency and see Apinizer’s real impact.

Pod Restart

We particularly observed that Kubernetes did not restart pods during runtime. Restart count is an important parameter as it reflects overload/congestion or error conditions.

JVM Monitoring

JVM performance metrics were monitored with JConsole. JMX service was exposed to the external world via Kubernetes.

Test Scenarios

Test scenarios were configured for different conditions:
  • Minimal configuration
  • Basic API Proxy
  • Test without policies
  • Medium level configuration
  • Basic policies
  • Standard usage scenario
  • Advanced configuration
  • Multiple policy support
  • High performance scenario
  • Optimized configuration
  • Performance optimizations
  • Production-ready scenario

Load Test Results

GET Requests Results

ConditionThread CountThroughput (reqs/sec)Avg Response Time (ms)
A501,13343
1001,10090
2501,025242
500963516
B502,23222
1002,16945
2502,089119
5001,915259
1,0001,762564
1,5001,631915
2,0001,3791,441
C508,0906
1007,81612
2507,01135
5006,75973
1,0006,742147
1,5006,683223
2,0006,692297
4,0006,448617
D5015,4203
10015,8126
25015,61415
50015,66431
1,00015,45464
1,50015,02699
2,00014,839133
4,00014,356276
8,00011,603655

POST 5KB Requests Results

ConditionThread CountThroughput (reqs/sec)Avg Response Time (ms)
A501,00249
100983101
250852292
B501,86826
1001,76856
2501,456170
5001,398355
1,0001,229809
1,5001,1991,245
C507,3536
1007,25713
2507,13834
5007,14169
1,0007,011141
1,5006,935215
D5013,3963
10013,4827
25013,58718
50013,61136
1,00013,56273
1,50013,208112
2,00013,179150
4,00012,792309
8,00011,115701

POST 50KB Requests Results

ConditionThread CountThroughput (reqs/sec)Avg Response Time (ms)
A5067573
100653152
250554448
B501,43734
1001,40970
2501,223203
5001,149432
1,0008771,134
C504,67910
1004,67521
2504,02061
5003,221154
1,0002,962335
D504,68310
1004,67121
2504,38256
5003,496142
1,0003,046326
1,5002,853522
2,0002,794710

Interpreting Results

Concurrent Users and Request Count

Important: A common mistake when examining results is confusing session count with instant request count.
Throughput & Concurrent Users: Throughput and Concurrent User Count Average Memory Usage: Average Memory Usage

Request

An HTTP request made with a specific HTTP method to a specific target

Session

Zero or more requests can be made per session
Keeping sessions in gateways is very rare, generally service access is stateless. Therefore, measuring concurrent request count and latency becomes more meaningful.

Scaling

Vertical Scaling

When concurrent user count increases, throughput increases up to a certain limit. After that, it starts to decline. This natural course indicates that vertical growth has a limit.

Horizontal Scaling

To support more concurrent users with acceptable response times, horizontal or vertical scaling should be considered together. Since Kubernetes infrastructure is used in Apinizer, this operation can be configured very easily and quickly.

Message Size Impact

When message sizes increase, processing power increases, so throughput decreases. Therefore, response time also increases.
Although request sizes are generally around 1KB average in real-life scenarios, we found it worthwhile to examine 5KB and 50KB POST requests since there was a very small difference between our 1KB POST and GET requests in our tests.
Although results naturally follow a lower value compared to GET requests, it was pleasing for us that numbers dropped to only one-fourth despite a 10-fold increase in data.

Memory Usage

RAM usage rates were very consistent throughout the load test. Even when request sizes increased tenfold, no significant increase in RAM usage was observed. This proved that OpenJ9 was the right choice.

Policy Performance Impact

Each policy we add to the gateway affects performance according to its complexity and dependencies.

Basic Authentication Policy Test

Tests were performed with “Basic Authentication” policy added (Condition D):
Thread CountGET ThroughputGET Avg (ms)GET with Policy ThroughputGET with Policy Avg (ms)
5015,420314,7603
10015,812614,8436
25015,6141514,89116
50015,6643114,74833
1,00015,4546414,28568
1,50015,0269914,373102
2,00014,83913314,280136
4,00014,35627613,795279
8,00011,60365511,437672
Throughput & Concurrent Users: Throughput Comparison - With and Without Policy Average Memory Usage: Average Memory Usage - With and Without Policy
As we can see, there was a performance impact, albeit imperceptible. However, if a computationally expensive policy like “content filtering” were added, or a policy requiring external connections like “LDAP Authentication” that also adds network latency were added, performance would drop even faster.

Policy Selection Best Practices

Policy Complexity

It is important to know how much load each policy will bring and choose the design accordingly.

External Connections

Policies requiring external connections like LDAP Authentication add network latency and affect performance.

Processing Power

Computationally expensive policies like content filtering increase CPU usage.

Policy Ordering

Policy ordering and conditional execution affect performance.

Performance Metrics

Throughput (Processing Speed)

Throughput shows the number of requests processed per second. With optimized configuration in Condition D:

GET Requests

15,000+ reqs/sec (2,000 thread)

POST 5KB Requests

13,000+ reqs/sec (2,000 thread)

POST 50KB Requests

2,700+ reqs/sec (2,000 thread)

Response Time

Average response times:

GET Requests

3-655 ms (depending on thread count)

POST 5KB Requests

3-701 ms

POST 50KB Requests

10-710 ms

Scalability

  • Tested up to 8,000 threads on a single node
  • Optimal performance observed at 2,000 threads
  • Throughput starts to drop at higher thread counts
  • Easy scaling with Kubernetes infrastructure
  • Multiple gateway support with load balancer
  • Automatic pod scaling

Results and Recommendations

Key Findings

High Performance

Throughput of 15,000+ reqs/sec achieved with optimized configuration.

Low Latency

Response times starting from 3ms for GET requests were observed.

Memory Efficiency

Even when message size increased tenfold, no significant increase in RAM usage was observed.

Policy Impact

Simple policies like Basic Authentication have minimal performance impact.

Recommendations

  • Use Condition D configuration
  • Perform horizontal scaling with Kubernetes
  • Configure monitoring and alerting
  • Evaluate performance impact of policies
  • Remove unnecessary policies
  • Optimize policy ordering
  • Calculate expected load
  • Optimize thread count
  • Plan horizontal scaling

Next Steps