Aerospike Monitoring Stack Release Notes

  • 3.0.0
    Release Date: November 9, 2023
    • Aerospike Monitoring Stack version 3.0.0 is a major upgrade with improved dashboards designed to be forward and backward compatible with Aerospike versions.
      • Establishes a consistent design pattern with status displayed at the top and detailed time ranges displayed below.
      • All dashboards and alerts are forward compatible to 7.x versions and backwards compatible to 5.x versions.
      • Removed unused and deprecated dashboards (alerts, exporter, and jobs).

    Breaking Changes

    • Alert severity is modified to be of type string like critical, error, warn, and info. Earlier number based severity is deprecated.
    • New alerts related to 7.0 metrics, connectors, and bug-fixes are added with string type severity only.
    • Removed the 3 deprecated alerts, exporters and job dashboards.

    New Features

    • [OM-116] - Add DynaTrace to the OTEL Examples.
      • Added support documentation and example otel-collector configurations on integrating Aerospike metrics with DynaTrace.
    • [OM-127] - Node View - Handle 7.0 metric changes.
      • Revamped dashboard according to 7.0 metrics theme and display build version, alert by severity, and data, index and memory metrics are split into respective panels.
      • Data, index and memory metrics are shown as minimum, average and maximum to identify anomalies easily across namespaces.
    • [OM-128] - Namespace View - Handle 7.0 metric changes.
      • Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
      • Data, index and memory metrics are shown as minimum, average and maximum to identify anomalies easily across all nodes.
    • [OM-129] - Unique Data View - Handle 7.0 metric changes.
      • Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
      • Displays usage across clusters and historical usage by each cluster.
      • Data is displayed in three layers: 1. all-clusters, 2. single-cluster, 3. by namespace in each cluster.
    • [OM-130] - Update Alert Rule to Handle 7.0 metric changes.
      • Enhanced alerts to use 7.x metrics and marked previous alerts with "pre7x" prefix.
      • List of alerts added / modified.
        • Modified - NamespaceDataCloseToStopWrites, LowDataAvailWarning, LowDataAvailCritical.
        • Added - HighDataUseNamespaceWarning, HighDataUseNamespaceCritical.
        • Renamed - pre7x_NamespaceSetQuotaWarning, pre7x_NamespaceSetQuotaAlertCritical.
    • [OM-133] - Set Index - Handle 7.0 metric changes.
      • Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
      • Data metrics are shown as minimum, average and maximum to identify anomalies easily across all nodes.
    • [OM-134] - All Flash - Handle 7.0 metrics.
      • Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
    • [OM-135] - Rolling Restart Dashboard - Handle 7.0 metric changes.
      • Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
      • Data and memory are now showing both top-k and bottom-k, which represents both over-utilized and under-utilized.
    • [OM-136] - Cluster view - Handle 7.0 metric changes.
      • Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
    • [OM-139] - Handle 7.0 - Multi cluster view.
      • Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
      • Topology diagram now shows dashboards and connectors using different diagrams.
    • [OM-144] - Remove deprecated job and alerts (old) dashboards.
      • Removed the 2 deprecated jobs, exporter and alerts dashboard. Alerts dashboard is replaced with new alertsview dashboard in previous release.

    Bug Fixes

    • [OM-140] - Standardized alert severity colors, bug-fix where info alert count now showing correctly.
    • [OM-114] - AMS - Change ordering of memory free pct graph on Rolling Restart dashboard.
    • [OM-74] - Avoid average function in namespace dashboard.
    • [OM-105] - Monitoring dashboard "Namespace" does not show namespace level values.
    • [OM-109] - Improve Dashboard Queries and Linting.

  • 2.8.0
    Release Date: September 20, 2023
    • The v2.8.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
    • This release includes 1 major feature - Connector Dashboard, Alerts and topology.
    • Aerospike Monitoring Stack version 2.8.0 adds 2 dashboard, alerts and bug fixes:
      • 2 dashboards to monitor connectors and connector JVM metrics.
      • Enhanced alerts to cover various aspects of Connector key metric thresholds and JVM health.
    • NOTE:
      • Aerospike Prometheus exporter 1.13.0 or greater must be used to get the Aerospike 6.4 metrics.
      • The Multi-Cluster View dashboard now requires the Diagram Panel plugin.

    New Features

    • [OM-64] - Create predefined Prometheus alert rules for Connectors.
      • This release include 6 alerts to cover mandatory functional and process/health of the Connectors.
        • Key alerts covered are connector-status, connector-request-lag, connector-request-errors, jvm heap, jvm cpu and jvm gc.
    • [OM-56] - Connectors alerts & Dashboards
      • Connector view dashboard which helps to monitor 6 connectors.
        • Connectors supported are - xdr-proxy, kafka-outboud, pulsar-outbound, esp-outbound, elastic-search and jms-outbound.
        • Key metrics covered are - request lag, request error, success, skipped, connections, xdr record byte size, etc....
    • [OM-107] - Create a dashboard for a Connector(s)
      • Connector JVM view dashboard which helps to monitor JVM health of 6 Connectors.
        • Connectors supported are - xdr-proxy, kafka-outboud, pulsar-outbound, esp-outbound, elastic-search and jms-outbound.
        • Key metrics covered are - uptime, cpu, memory, threads, files, classes and buffers.
    • Multi-cluster view dashboard is enhanced to display Aerospike Server topology using the cluster-name and xdr dc configurations.
      • NOTE:
        • To view data replication topology in multi-cluster-view.
        • The cluster-name is mandatory and destination cluster-name is configured as the name of dc in xdr section of the Aerospike Server configuration.

    Bug Fixes

    • [OM-122] - Avoid duplicate defrag metric values on the namespace dashboard.
    • [OM-113] - Namespace view dashboard - average objects per sprig stat.
    • [OM-120] - Add high-water mark breached to the Rolling Restart dashboard.

  • 2.7.0
    Release Date: August 28, 2023
    • The v2.7.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
    • This release includes 2 major features - Enhanced Alerts and All Flash use-case dashboard.
    • Aerospike Monitoring Stack version 2.7.0 adds new dashboard and bug fixes:
      • All Flash dashboard, various key metrics which should be monitored while working with flash storage at both index and sindex.
      • Enhanced alerts to cover various aspects of server metrics, this release covers alerts on Namespaces, XDR, Latencies, Best checks, Node-exporter etc...
    • Aerospike Prometheus exporter 1.13.0 or greater must be used to get the Aerospike 6.4 metrics.

    New Features

    • [OM-104] - Add new XDR bytes-shipped metrics to dashboards.
      • Display bytes-shipped both as stat and time-series which can help monitoring the replication progress.
    • [OM-98] - Observability & Management Alerts - Enhance / enrich prometheus alerts from ACMS.
      • This release includes 40 alerts covering various metrics of Aerospike Server, some key areas are:
        • Namespaces, Latencies, data replication (xdr), set, node-exporter, flash , best checks etc...
    • [OM-93] - Use-case Dashboard: all-flash.
      • A new use-case dashboard is introduced in this release, this dashboard focuses mainly on key metrics and alerts related to flash usage.
        • Some key metrics are average-objects per sprig, index-pressure, primary index flash and secondary index flash etc...
    • [OM-48] - Use-case Dashboard Organization & Naming.
      • Added brief descriptions on each dashboard and updated tags to identify each dashboard easily.
    • [OM-111] - Observability dashboard unit tests.
      • Created a framework to test our dashboard automatically including panels, expression / queries, layout and expression results.
    • [OM-103] - Add user stat related alerts.
      • Added user stat specific alerts covering connections, connection churn etc...
    • [OM-101] - Add warning for best practice failures.
      • Alerts if best-practices are not followed while setting up the Aerospike server, this flag is sent by the server after a series of checks.
    • [OM-102] - Add warning for node-exporter not being present.
      • As a precursor to integrate node-exporter metrics into Aerospike Monitoring stack, this alert is introduced if node-exporter is not configured, raising a warning alert in the Alerts View dashboard.

  • 2.6.1
    Release Date: August 3, 2023
    • The v2.6.1 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
    • Aerospike Monitoring Stack version 2.6.1 adds bug fixes.
    • Aerospike Prometheus exporter 1.12.0 or greater must be used to get the Aerospike 6.3 metrics.
    • Deprecated
      • Existing Alerts dashboard is deprecated and will be removed in future releases.
      • Existing Jobs dashboard is deprecated and will be removed in future releases.

    Bug Fixes

    • [OM-100] - Issues in Multi-cluster view dashboard
      • Corrected label and unit in XDR panel.
      • Corrected links from XDR and Latencies to respective dashboards (instead of cluster-view).
      • Added a alert-severity based filter.
    • Issues in Alerts view
      • Panel colors are corrected according to the severity types.
    • Issues in Unique Data view
      • Unique data bytes are not shown correctly when custom labels are enabled in configuration.
      • Added historical time-series for unique data-bytes data point.

  • 2.6.0
    Release Date: July 12, 2023
    • The v2.6.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
    • This release eliminates instances of hard coded values for variables. As a result, the user needs to ensure that the Aerospike Prometheus data source is selected as a default in order for dashboard data to populate correctly.
    • Aerospike Monitoring Stack version 2.6.0 adds the new dashboard and bug fixes.
      • Rolling restarts dashboard, various key metrics which should be monitored during specific use cases.
      • Alerts View dashboard, adopting more meaningful alert severity levels.
    • Aerospike Prometheus exporter 1.12.0 or greater must be used to get the Aerospike 6.3 metrics.
    • Deprecated
      • Existing Alerts dashboard is deprecated and will be removed in future releases.
      • Existing Jobs dashboard is deprecated and will be removed in future releases.

    New Features

    • [OM-79] - Rolling Restarts dashboard, data is shown in group like stats, error and resources.
      • This dashboard curates various key metrics which should be monitored during specific use cases, like:
        • Node restart
        • Software upgrade
        • Investigation
        • etc...
      • Resource utilization is displayed for the TopK major consumers at a service and namespace level.
    • [OM-85] - Added the new Alerts view dashboard. This visualizes alerts according to the severity as count and each alert.
      • Newly adopted alert levels in decreasing order
        • critical, error, warn and info.
      • This dashboard replaces the existing Alerts dashboard.
    • [OM-82] - All Aerospike dashboards and panel visualizations are modified according to the Grafana 9.x version.
    • [OM-49] - Improved and reorganized Aerospike Monitoring stack examples:
      • Reorganized docker compose file in relevant folder.
      • Added examples on how to use AeroLab which can spin up Aerospike clusters per Proof of Concept (POC) needs.

    Bug Fixes

    • [OM-82] - Includes bug fixes related to queries and visualizations:
      • All queries now include proper regex pattern to honor single or multiple value template variable selection.
      • All Time-Series are adjusted to use range vector.
      • All dashboard have standardized template variable and same order.

  • 2.5.0
    Release Date: June 19, 2023
    • NOTE: The v2.5.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
    • Aerospike Monitoring Stack version 2.5.0 adds the new Multi Cluster view dashboard, Otel integration examples and bug fixes
      • NOTE: Aerospike Prometheus exporter 1.11.0 or greater must be used to get the Aerospike 6.3 metrics.

    New Features

    • [OM-45] - Added the new Multi cluster view dashboard. This visualizes multiple clusters across regions and data centers with a focus on health. This dashboard consists of 4 panels.
      • Geomap panel - displays multiple cluster view.
      • Cluster panel - displays key metrics like size, alerts, XDR lag, Read & Write latencies.
      • Node panel - uses the Polystat plugin and displays nodes in Green or Red indicating the health.
      • Namespace panel - displays namespaces in Green or Red indicating the health.
      • Key metrics used in this dashboard:
        • aerospike_node_up
        • aerospike_namespace_objects
        • aerospike_node_stats_cluster_size
        • aerospike_xdr_lag
        • aerospike_latencies_write_ms_bucket
        • aerospike_latencies_read_ms_bucket
    • [OM-60] - Added new examples on how to integrate Aerospike prometheus exporter with the Otel collector and export metrics to a partner solution.
      • Partner integration examples are provided for NewRelic, Datadog and Cloudwatch.

    Bug Fixes

    • [OM-76] - In the Namespace dashboard, the Defrag row hides anomalies as a result of aggregation.
      • Removed the Defrag row, as aggregation is removed and moved from the defrag panels to the namespace row to display defrag metrics for each namespace.

  • 2.4.0
    Release Date: May 16, 2023
    • NOTE: The v2.4.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
    • Aerospike Monitoring Stack version 2.4.0 adds support for metrics introduced in Aerospike 6.3.
      • NOTE: Aerospike Prometheus exporter 1.11.0 or greater must be used to get the Aerospike 6.3 metrics.

    New Features

    Bug Fixes

  • 2.3.1
    Release Date: April 19, 2023

    Bug Fixes

    • [OM-37] - Issues in Set view, Unique data view, Sindex view, Namespace view and Node view:
      • Fixed issue in "Set view" dashboard to remove hardcoded datasource.
      • Re-exported Set view, Unique data view, Sindex view, Namespace view and Node view dashboards with right configurations so they are suitable to be made available in Grafana Cloud.

  • 2.3.0
    Release Date: April 3, 2023
    • NOTE: the v2.3.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0..
    • Aerospike Monitoring Stack version 2.3.0 adds support for metrics introduced in Aerospike 6.3.
      • NOTE: Aerospike Prometheus exporter 1.10.0 or greater must be used to get the Aerospike 6.3 metrics.

    Features

      • Added 6.3 metrics:
        • Adds aerospike_sindex_used_bytes secondary index metric.
        • Adds aerospike_namespace_nsup_cycle_deleted_pct NSUP metric.
        • Adds aerospike_sets_stop_writes_size set level configuration.
      • Updated memory used panel in secondary index to consider aerospike_sindex_used_bytes or aerospike_sindex_memory_used as aerospike_sindex_memory_used is deprecated in Aerospike 6.3.
      • Added nsup metrics panel to Namespace view dashboard.
      • Added set level quotas panel to Namespace view dashboard.
      • Added a new dashboard displaying set level metrics.
      • Added a new dashboard displaying unique data usage.
      • Added 4 new prometheus alerts:
        • NamespaceSupervisorFallingBehind when NSUP is falling behind and/or display the length of time the most recent NSUP cycle lasted.
        • NamespaceFreeMemoryCloseToStopWrites when one of your Aerospike nodes memory is close to the stop writes limit configured for a namespace.
        • NamespaceSetQuotaWarning when one of your Aerospike nodes is at 80% of the quota you have configured on a set.
        • NamespaceSetQuotaAlert when one of your Aerospike nodes is at 99% of the quota you have configured on a set.
  • 2.2.0
    Release Date: August 26, 2022
    • The version 2.2.0 Grafana dashboards are not backwards compatible with Aerospike servers older than 6.0.0.0.
    • Aerospike Monitoring Stack version 2.2.0 adds support for metrics introduced in Aerospike 6.1.
    • Aerospike Prometheus Exporter version 1.8.0 or greater must be used to get the Aerospike 6.1 metrics.

    New Features

    • [TOOLS-2087] - Add server 6.1 metrics.
      • Adds aerospike_xdr_bytes_shipped.
      • Adds aerospike_sindex_entries_per_bval.
      • Adds aerospike_sindex_entries_per_rec.
    • [TOOLS-2132] Replace latency panels with heat map and percentiles.

  • 2.1.0
    Release Date: July 19, 2022
    • The version 2.1.0 Grafana dashboards are not backwards compatible with Aerospike servers older than 6.0.0.0.
    • Aerospike Monitoring Stack version 2.1.0 adds support for the batch-index latency metrics aerospike_latencies_batch_index_us_bucket and aerospike_latencies_batch_index_us_count.
    • Aerospike Prometheus Exporter version 1.7.0 or greater must be used to get the batch-index latency metrics.

    New Features

    • [TOOLS-2069] - Add batch-index latency panels.

  • 2.0.0
    Release Date: June 10, 2022
    • The version 2.0.0 Grafana dashboards are not backwards compatible with Aerospike servers older than 6.0.0.0.
    • Aerospike Monitoring version 2.0.0 adds support for many new Aerospike 6.0 metrics in the Grafana dashboards, like the following:
      • Primary index queries.
      • Secondary Index queries.
      • Batch sub transactions. (non proxied)
      • Add overall reads/writes (client_read/write_success + batch_sub_read/write_success) to cluster, node, and namespace dashboards.
      • New job information such as job type.
      • si-query and pi-query latencies.
      • Add memory_used stats to SIndex dashboard, remove the many SIndex metrics dropped in Aerospike Server version 6.0.
      • Remove any mention of scans.
      • Other miscellaneous changes. See pull request 33 for more details.

    New Features

    • [TOOLS-2044] - Display Aerospike 6 metrics.

  • 1.4.0
    Release Date: March 14, 2022

    New Features

    • [TOOLS-1956] - Add Jobs View and Secondary Index View dashboards
      • [TOOLS-1946] - Add support for per-job scan and query statistics
      • [TOOLS-1947] - Add support for secondary index statistics

  • 1.3.2
    Release Date: September 7, 2021

    Improvements

    • [TOOLS-1785] - Add new metrics introduced in Aerospike 5.7.

  • 1.3.1
    Release Date: June 15, 2021

    Improvements

    • Adds "Exporters View" dashboard to track status of all Aerospike Prometheus Exporter targets.

    Bug Fixes

    • [TOOLS-1721] - Fixes incorrect status of the exporters and Aerospike nodes in the "Node View" dashboard.

  • 1.3.0
    Release Date: June 4, 2021

    New Features

    Improvements

    Bug Fixes

    • Fixed 90th percentile latency computation in Latency View dashboard to not use rate(). Thanks to @ashangit for the contribution.

  • 1.2.1
    Release Date: January 27, 2021

    Improvements

    • Added DC nodes metric to XDR dashboard.

  • 1.2.0
    Release Date: November 16, 2020

    New Features

    • [TOOLS-1589] - Migrate dashboards to Grafana 7.

    Improvements

    • [TOOLS-1591] - Make datasource configurable through a dashboard variable. Thanks to realmgic (Zohar) for the contribution.
    • [TOOLS-1588] - Alert when 'close to' stop writes, when node is proxying and when XDR lag is above a threshold.
    • [TOOLS-1590] - Add Prometheus' docker swarm service discovery config to the example.

    Bug Fixes

    • [TOOLS-1592] - Fix units for "Failure rate" panel in Namespace view.

  • 1.1.1
    Release Date: August 31, 2020

    Improvements

    • Use latency time unit in queries to support Aerospike's microsecond histograms. Add variable for latency time unit to Latency View and Node Overview dashboards.

    Bug Fixes

    • Refresh variables on time range change.

  • 1.1.0
    Release Date: July 27, 2020

    New Features

    • Add description info to each dashboard panel.
    • Add clock_skew_stop_writes to Namespace View and Cluster View dashboards.
    • Add dashboard support for the new latency metrics change in Aerospike Prometheus Exporter v1.1.0.
    • Show primary index usage for namespaces using index-type flash or pmem.

    Improvements

    Bug Fixes

    • Fix primary index usage panel to show values in MiB/GiB.