CacheMetricsService collects various metrics about Hazelcast cache, API Performance, JVM and System Health. These metrics fall into the following categories.

  1. Hazelcast cache statistics
  2. API performance metrics
  3. JVM
  4. System Metrics

Prometheus Metric Types

Cache metrics are collected using Prometheus' four metric types. These types are designed to best represent different data types and behaviors. Each metric type serves different purposes depending on how you collect and analyze your data.

Counter

A counter is simply an incrementing value. It starts at zero when the application is running and only resets when the application is restarted. Counter-type metrics are ideal for tracking continuously increasing values such as total request count, error count, or completed transaction count.

Available Operations:

  • sum: The total value itself.
  • rate: Calculates the rate of increase over time (e.g., how much it increases per second).
  • increase: Calculates the total increase over a specific time interval.

Gauge

A Gauge represents an instantaneous value. This value can increase, decrease, or remain constant. Gauge-type metrics are used to monitor a momentary state or level, such as current memory usage, instantaneous CPU usage, or the number of active threads.

Available Operations:

  • sum: The sum of Gauge values grouped by tags.
  • mean (average): The average of Gauge values grouped by tags.
  • min/max: The minimum or maximum of values grouped by tags.

Timer

Timer measures how long a process takes (usually in milliseconds or seconds). Although it is not a specific type in Prometheus, it is a metric created by libraries such as Micrometer using a combination of DistributionSummary and Counter. These metrics provide information such as average duration, maximum duration, and percentiles.

Available Operations:

  • sum: Returns the total duration.
  • count: Returns the total number of times the operation was performed.
  • mean (average): Calculates the average duration of the operation.
  • max: Returns the longest observed duration.
  • histogram_quantile: Calculates percentiles.

Summary

Summary is a metric that calculates both the sum of the observed values and the predefined quantiles. The difference between Summary and Timer and DistributionSummary is that it calculates the quantiles directly itself; this ensures that the quantiles are more accurate because they are calculated while the metric is being collected, not at query time.

Available Operations:

  • sum: The sum of the observed values.
  • count: The number of observed values.
  • Specified quantiles (<metric_name>{quantile=“0.99”}).

apinizer_cache_api_response_time_bucket

These metrics monitor Cache's cache performance. Efficiency is analyzed by tracking cache lookups, additions and latencies. In addition, memory cost and partition breakdown are measured.

MetricDescriptionType
cache_gets_totalTotal cache searches (hits and misses)Counter
cache_puts_totalTotal cache additionsCounter
cache_sizeCurrent number of entries in cacheGauge
cache_entriesNumber of entries per cache partitionGauge
cache_entry_memory_bytesMemory cost of cache entriesGauge
cache_gets_latency_secondsCache access latencySummary
cache_puts_latency_secondsCache insertion delaySummary
cache_removals_latency_secondsCache removal delaySummary

API Metrics

These metrics track the performance of Cache's APIs. API performance is evaluated with data such as number of requests, response time and error rates.

MetricDescriptionType
apinizer_cache_api_requests_totalTotal number of API requestsCounter
apinizer_cache_api_response_timeAPI response time (seconds)Timer
apinizer_cache_api_errors_totalTotal number of API bugsCounter

JVM Metrics

These metrics track Cache's memory and thread utilization. JVM performance is analyzed with data such as memory usage, GC (Garbage Collection) pause times and active threads.

MetricDescriptionType
jvm_memory_used_bytesMemory usage by space (heap/non-heap)Gauge
jvm_memory_committed_bytesMemory allocated by areaGauge
jvm_memory_max_bytesMaximum memory by areaGauge
jvm_gc_pause_secondsGC pause timeSummary
jvm_threads_live_threadsNumber of live threads availableGauge
jvm_threads_daemon_threadsNumber of available daemon threadsGauge

System Metrics

These metrics monitor Cache's overall performance with data such as CPU utilization, number of processors and load average. In addition, resource utilization is evaluated with data such as processing time and number of open files.

MetricDescriptionType
system_cpu_usageCPU utilization of the main systemGauge
system_cpu_countNumber of available processorsGauge
system_load_average_1mSystem load average (1 minute)Gauge
process_cpu_usageCPU usage of the JVM processGauge
process_uptime_secondsJVM process uptimeGauge
process_files_open_filesNumber of open file identifiersGauge