Aerospike Monitoring Stack Release Notes
-
Release Date: September 25, 2024
-
Release Date: July 24, 2024
-
Release Date: June 19, 2024
-
Release Date: May 9, 2024
-
Release Date: April 9, 2024
-
Release Date: March 5, 2024
-
Release Date: February 13, 2024
-
Release Date: January 10, 2024
-
Release Date: December 12, 2023
-
Release Date: November 9, 2023
- Aerospike Monitoring Stack version 3.0.0 is a major upgrade with improved dashboards designed to be forward and backward compatible with Aerospike versions.
- Establishes a consistent design pattern with status displayed at the top and detailed time ranges displayed below.
- All dashboards and alerts are forward compatible to 7.x versions and backwards compatible to 5.x versions.
- Removed unused and deprecated dashboards (alerts, exporter, and jobs).
Breaking Changes
- Alert severity is modified to be of type string like critical, error, warn, and info. Earlier number based severity is deprecated.
- New alerts related to 7.0 metrics, connectors, and bug-fixes are added with string type severity only.
- Removed the 3 deprecated alerts, exporters and job dashboards.
New Features
- [OM-116] - Add DynaTrace to the OTEL Examples.
- Added support documentation and example otel-collector configurations on integrating Aerospike metrics with DynaTrace.
- [OM-127] - Node View - Handle 7.0 metric changes.
- Revamped dashboard according to 7.0 metrics theme and display build version, alert by severity, and data, index and memory metrics are split into respective panels.
- Data, index and memory metrics are shown as minimum, average and maximum to identify anomalies easily across namespaces.
- [OM-128] - Namespace View - Handle 7.0 metric changes.
- Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
- Data, index and memory metrics are shown as minimum, average and maximum to identify anomalies easily across all nodes.
- [OM-129] - Unique Data View - Handle 7.0 metric changes.
- Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
- Displays usage across clusters and historical usage by each cluster.
- Data is displayed in three layers: 1. all-clusters, 2. single-cluster, 3. by namespace in each cluster.
- [OM-130] - Update Alert Rule to Handle 7.0 metric changes.
- Enhanced alerts to use 7.x metrics and marked previous alerts with "pre7x" prefix.
- List of alerts added / modified.
- Modified - NamespaceDataCloseToStopWrites, LowDataAvailWarning, LowDataAvailCritical.
- Added - HighDataUseNamespaceWarning, HighDataUseNamespaceCritical.
- Renamed - pre7x_NamespaceSetQuotaWarning, pre7x_NamespaceSetQuotaAlertCritical.
- [OM-133] - Set Index - Handle 7.0 metric changes.
- Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
- Data metrics are shown as minimum, average and maximum to identify anomalies easily across all nodes.
- [OM-134] - All Flash - Handle 7.0 metrics.
- Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
- [OM-135] - Rolling Restart Dashboard - Handle 7.0 metric changes.
- Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
- Data and memory are now showing both top-k and bottom-k, which represents both over-utilized and under-utilized.
- [OM-136] - Cluster view - Handle 7.0 metric changes.
- Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
- [OM-139] - Handle 7.0 - Multi cluster view.
- Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
- Topology diagram now shows dashboards and connectors using different diagrams.
- [OM-144] - Remove deprecated job and alerts (old) dashboards.
- Removed the 2 deprecated jobs, exporter and alerts dashboard. Alerts dashboard is replaced with new alertsview dashboard in previous release.
Bug Fixes
- [OM-140] - Standardized alert severity colors, bug-fix where info alert count now showing correctly.
- [OM-114] - AMS - Change ordering of memory free pct graph on Rolling Restart dashboard.
- [OM-74] - Avoid average function in namespace dashboard.
- [OM-105] - Monitoring dashboard "Namespace" does not show namespace level values.
- [OM-109] - Improve Dashboard Queries and Linting.
-
Release Date: September 20, 2023
- The v2.8.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- This release includes 1 major feature - Connector Dashboard, Alerts and topology.
- Aerospike Monitoring Stack version 2.8.0 adds 2 dashboard, alerts and bug fixes:
- 2 dashboards to monitor connectors and connector JVM metrics.
- Enhanced alerts to cover various aspects of Connector key metric thresholds and JVM health.
- NOTE:
- Aerospike Prometheus exporter 1.13.0 or greater must be used to get the Aerospike 6.4 metrics.
- The Multi-Cluster View dashboard now requires the Diagram Panel plugin.
New Features
- [OM-64] - Create predefined Prometheus alert rules for Connectors.
- This release include 6 alerts to cover mandatory functional and process/health of the Connectors.
- Key alerts covered are connector-status, connector-request-lag, connector-request-errors, jvm heap, jvm cpu and jvm gc.
- [OM-56] - Connectors alerts & Dashboards
- Connector view dashboard which helps to monitor 6 connectors.
- Connectors supported are - xdr-proxy, kafka-outboud, pulsar-outbound, esp-outbound, elastic-search and jms-outbound.
- Key metrics covered are - request lag, request error, success, skipped, connections, xdr record byte size, etc....
- [OM-107] - Create a dashboard for a Connector(s)
- Connector JVM view dashboard which helps to monitor JVM health of 6 Connectors.
- Connectors supported are - xdr-proxy, kafka-outboud, pulsar-outbound, esp-outbound, elastic-search and jms-outbound.
- Key metrics covered are - uptime, cpu, memory, threads, files, classes and buffers.
- Multi-cluster view dashboard is enhanced to display Aerospike Server topology using the cluster-name and xdr dc configurations.
- NOTE:
- To view data replication topology in multi-cluster-view.
- The cluster-name is mandatory and destination cluster-name is configured as the name of dc in xdr section of the Aerospike Server configuration.
Bug Fixes
- [OM-122] - Avoid duplicate defrag metric values on the namespace dashboard.
- [OM-113] - Namespace view dashboard - average objects per sprig stat.
- [OM-120] - Add high-water mark breached to the Rolling Restart dashboard.
-
Release Date: August 28, 2023
- The v2.7.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- This release includes 2 major features - Enhanced Alerts and All Flash use-case dashboard.
- Aerospike Monitoring Stack version 2.7.0 adds new dashboard and bug fixes:
- All Flash dashboard, various key metrics which should be monitored while working with flash storage at both index and sindex.
- Enhanced alerts to cover various aspects of server metrics, this release covers alerts on Namespaces, XDR, Latencies, Best checks, Node-exporter etc...
- Aerospike Prometheus exporter 1.13.0 or greater must be used to get the Aerospike 6.4 metrics.
New Features
- [OM-104] - Add new XDR bytes-shipped metrics to dashboards.
- Display bytes-shipped both as stat and time-series which can help monitoring the replication progress.
- [OM-98] - Observability & Management Alerts - Enhance / enrich prometheus alerts from ACMS.
- This release includes 40 alerts covering various metrics of Aerospike Server, some key areas are:
- Namespaces, Latencies, data replication (xdr), set, node-exporter, flash , best checks etc...
- [OM-93] - Use-case Dashboard: all-flash.
- A new use-case dashboard is introduced in this release, this dashboard focuses mainly on key metrics and alerts related to flash usage.
- Some key metrics are average-objects per sprig, index-pressure, primary index flash and secondary index flash etc...
- [OM-48] - Use-case Dashboard Organization & Naming.
- Added brief descriptions on each dashboard and updated tags to identify each dashboard easily.
- [OM-111] - Observability dashboard unit tests.
- Created a framework to test our dashboard automatically including panels, expression / queries, layout and expression results.
- [OM-103] - Add user stat related alerts.
- Added user stat specific alerts covering connections, connection churn etc...
- [OM-101] - Add warning for best practice failures.
- Alerts if best-practices are not followed while setting up the Aerospike server, this flag is sent by the server after a series of checks.
- [OM-102] - Add warning for node-exporter not being present.
- As a precursor to integrate node-exporter metrics into Aerospike Monitoring stack, this alert is introduced if node-exporter is not configured, raising a warning alert in the Alerts View dashboard.
-
Release Date: August 3, 2023
- The v2.6.1 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- Aerospike Monitoring Stack version 2.6.1 adds bug fixes.
- Aerospike Prometheus exporter 1.12.0 or greater must be used to get the Aerospike 6.3 metrics.
- Deprecated
- Existing Alerts dashboard is deprecated and will be removed in future releases.
- Existing Jobs dashboard is deprecated and will be removed in future releases.
Bug Fixes
- [OM-100] - Issues in Multi-cluster view dashboard
- Corrected label and unit in XDR panel.
- Corrected links from XDR and Latencies to respective dashboards (instead of cluster-view).
- Added a alert-severity based filter.
- Issues in Alerts view
- Panel colors are corrected according to the severity types.
- Issues in Unique Data view
- Unique data bytes are not shown correctly when custom labels are enabled in configuration.
- Added historical time-series for unique data-bytes data point.
-
Release Date: July 12, 2023
- The v2.6.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- This release eliminates instances of hard coded values for variables. As a result, the user needs to ensure that the Aerospike Prometheus data source is selected as a default in order for dashboard data to populate correctly.
- Aerospike Monitoring Stack version 2.6.0 adds the new dashboard and bug fixes.
- Rolling restarts dashboard, various key metrics which should be monitored during specific use cases.
- Alerts View dashboard, adopting more meaningful alert severity levels.
- Aerospike Prometheus exporter 1.12.0 or greater must be used to get the Aerospike 6.3 metrics.
- Deprecated
- Existing Alerts dashboard is deprecated and will be removed in future releases.
- Existing Jobs dashboard is deprecated and will be removed in future releases.
New Features
- [OM-79] - Rolling Restarts dashboard, data is shown in group like stats, error and resources.
- This dashboard curates various key metrics which should be monitored during specific use cases, like:
- Node restart
- Software upgrade
- Investigation
- etc...
- Resource utilization is displayed for the TopK major consumers at a service and namespace level.
- [OM-85] - Added the new Alerts view dashboard. This visualizes alerts according to the severity as count and each alert.
- Newly adopted alert levels in decreasing order
critical
, error
, warn
and info
.
- This dashboard replaces the existing Alerts dashboard.
- [OM-82] - All Aerospike dashboards and panel visualizations are modified according to the Grafana 9.x version.
- [OM-49] - Improved and reorganized Aerospike Monitoring stack examples:
- Reorganized docker compose file in relevant folder.
- Added examples on how to use AeroLab which can spin up Aerospike clusters per Proof of Concept (POC) needs.
Bug Fixes
- [OM-82] - Includes bug fixes related to queries and visualizations:
- All queries now include proper regex pattern to honor single or multiple value template variable selection.
- All Time-Series are adjusted to use range vector.
- All dashboard have standardized template variable and same order.
-
Release Date: June 19, 2023
- NOTE: The v2.5.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- Aerospike Monitoring Stack version 2.5.0 adds the new Multi Cluster view dashboard, Otel integration examples and bug fixes
- NOTE: Aerospike Prometheus exporter 1.11.0 or greater must be used to get the Aerospike 6.3 metrics.
New Features
- [OM-45] - Added the new Multi cluster view dashboard. This visualizes multiple clusters across regions and data centers with a focus on health. This dashboard consists of 4 panels.
- Geomap panel - displays multiple cluster view.
- Cluster panel - displays key metrics like size, alerts, XDR lag, Read & Write latencies.
- Node panel - uses the Polystat plugin and displays nodes in Green or Red indicating the health.
- Namespace panel - displays namespaces in Green or Red indicating the health.
- Key metrics used in this dashboard:
aerospike_node_up
aerospike_namespace_objects
aerospike_node_stats_cluster_size
aerospike_xdr_lag
aerospike_latencies_write_ms_bucket
aerospike_latencies_read_ms_bucket
- [OM-60] - Added new examples on how to integrate Aerospike prometheus exporter with the Otel collector and export metrics to a partner solution.
- Partner integration examples are provided for NewRelic, Datadog and Cloudwatch.
Bug Fixes
- [OM-76] - In the Namespace dashboard, the Defrag row hides anomalies as a result of aggregation.
- Removed the Defrag row, as aggregation is removed and moved from the defrag panels to the namespace row to display defrag metrics for each namespace.
-
Release Date: May 16, 2023
- NOTE: The v2.4.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- Aerospike Monitoring Stack version 2.4.0 adds support for metrics introduced in Aerospike 6.3.
- NOTE: Aerospike Prometheus exporter 1.11.0 or greater must be used to get the Aerospike 6.3 metrics.
New Features
- [OM-62] - Added defrag metrics to the namespace view dashboard.
Bug Fixes
- [OM-61] - In namespace view dashboard NSUP Cycle is summed, instead of showing max/average.
- [OM-22] - Migration summary doubles up in cluster view dashboard.
-
Release Date: April 19, 2023
Bug Fixes
- [OM-37] - Issues in Set view, Unique data view, Sindex view, Namespace view and Node view:
- Fixed issue in "Set view" dashboard to remove hardcoded datasource.
- Re-exported Set view, Unique data view, Sindex view, Namespace view and Node view dashboards with right configurations so they are suitable to be made available in Grafana Cloud.
-
Release Date: April 3, 2023
- NOTE: the v2.3.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0..
- Aerospike Monitoring Stack version 2.3.0 adds support for metrics introduced in Aerospike 6.3.
- NOTE: Aerospike Prometheus exporter 1.10.0 or greater must be used to get the Aerospike 6.3 metrics.
Features
- Added 6.3 metrics:
- Adds
aerospike_sindex_used_bytes
secondary index metric.
- Adds
aerospike_namespace_nsup_cycle_deleted_pct
NSUP metric.
- Adds
aerospike_sets_stop_writes_size
set level configuration.
- Updated memory used panel in secondary index to consider
aerospike_sindex_used_bytes
or aerospike_sindex_memory_used
as aerospike_sindex_memory_used
is deprecated in Aerospike 6.3.
- Added nsup metrics panel to Namespace view dashboard.
- Added set level quotas panel to Namespace view dashboard.
- Added a new dashboard displaying set level metrics.
- Added a new dashboard displaying unique data usage.
- Added 4 new prometheus alerts:
NamespaceSupervisorFallingBehind
when NSUP is falling behind and/or display the length of time the most recent NSUP cycle lasted.
NamespaceFreeMemoryCloseToStopWrites
when one of your Aerospike nodes memory is close to the stop writes limit configured for a namespace.
NamespaceSetQuotaWarning
when one of your Aerospike nodes is at 80% of the quota you have configured on a set.
NamespaceSetQuotaAlert
when one of your Aerospike nodes is at 99% of the quota you have configured on a set.
-
Release Date: August 26, 2022
- The version 2.2.0 Grafana dashboards are not backwards compatible with Aerospike servers older than 6.0.0.0.
- Aerospike Monitoring Stack version 2.2.0 adds support for metrics introduced in Aerospike 6.1.
- Aerospike Prometheus Exporter version 1.8.0 or greater must be used to get the Aerospike 6.1 metrics.
New Features
- [TOOLS-2087] - Add server 6.1 metrics.
- Adds aerospike_xdr_bytes_shipped.
- Adds aerospike_sindex_entries_per_bval.
- Adds aerospike_sindex_entries_per_rec.
- [TOOLS-2132] Replace latency panels with heat map and percentiles.
-
Release Date: July 19, 2022
- The version 2.1.0 Grafana dashboards are not backwards compatible with Aerospike servers older than 6.0.0.0.
- Aerospike Monitoring Stack version 2.1.0 adds support for the batch-index latency metrics aerospike_latencies_batch_index_us_bucket and aerospike_latencies_batch_index_us_count.
- Aerospike Prometheus Exporter version 1.7.0 or greater must be used to get the batch-index latency metrics.
New Features
- [TOOLS-2069] - Add batch-index latency panels.
-
Release Date: June 10, 2022
- The version 2.0.0 Grafana dashboards are not backwards compatible with Aerospike servers older than 6.0.0.0.
- Aerospike Monitoring version 2.0.0 adds support for many new Aerospike 6.0 metrics in the Grafana dashboards, like the following:
- Primary index queries.
- Secondary Index queries.
- Batch sub transactions. (non proxied)
- Add overall reads/writes (client_read/write_success + batch_sub_read/write_success) to cluster, node, and namespace dashboards.
- New job information such as job type.
- si-query and pi-query latencies.
- Add memory_used stats to SIndex dashboard, remove the many SIndex metrics dropped in Aerospike Server version 6.0.
- Remove any mention of scans.
- Other miscellaneous changes. See pull request 33 for more details.
New Features
- [TOOLS-2044] - Display Aerospike 6 metrics.
-
Release Date: March 14, 2022
New Features
- [TOOLS-1956] - Add Jobs View and Secondary Index View dashboards
- [TOOLS-1946] - Add support for per-job scan and query statistics
- [TOOLS-1947] - Add support for secondary index statistics
-
Release Date: September 7, 2021
Improvements
- [TOOLS-1785] - Add new metrics introduced in Aerospike 5.7.
-
Release Date: June 15, 2021
Improvements
- Adds "Exporters View" dashboard to track status of all Aerospike Prometheus Exporter targets.
Bug Fixes
- [TOOLS-1721] - Fixes incorrect status of the exporters and Aerospike nodes in the "Node View" dashboard.
-
Release Date: June 4, 2021
New Features
Improvements
Bug Fixes
- Fixed 90th percentile latency computation in Latency View dashboard to not use
rate()
. Thanks to @ashangit for the contribution.
-
Release Date: January 27, 2021
Improvements
- Added DC
nodes
metric to XDR dashboard.
-
Release Date: November 16, 2020
New Features
- [TOOLS-1589] - Migrate dashboards to Grafana 7.
Improvements
- [TOOLS-1591] - Make datasource configurable through a dashboard variable. Thanks to realmgic (Zohar) for the contribution.
- [TOOLS-1588] - Alert when 'close to' stop writes, when node is proxying and when XDR lag is above a threshold.
- [TOOLS-1590] - Add Prometheus' docker swarm service discovery config to the example.
Bug Fixes
- [TOOLS-1592] - Fix units for "Failure rate" panel in Namespace view.
-
Release Date: August 31, 2020
Improvements
- Use latency time unit in queries to support Aerospike's microsecond histograms. Add variable for latency time unit to
Latency View
and Node Overview
dashboards.
Bug Fixes
- Refresh variables on time range change.
-
Release Date: July 27, 2020
New Features
Improvements
Bug Fixes
- Fix primary index usage panel to show values in MiB/GiB.
-
Release Date: July 27, 2020