Metrics Unavailability Across All Instance Tiers

Incident Report for Neo4j Aura

Postmortem

What happened

As we rolled out our regular monthly release (2025.04), we introduced the possibility for the metric log.appended_bytes in the DBMS to return a negative value, which became apparent because with this version we released a new datastore version. The process caused the metrics HTTP endpoint of the Neo4j DBMS to fail.

This issue was undetected because it affected only a subset of instances and it got corrected for the instances that had rolled subsequently to another component change we delivered as part of the release. 

The issue only occurred on instances that were not part of that restart due to the roll.   

This issue also prevented us from fully collecting metrics from those DBMS instances, which impacted monitoring of the instances, troubleshooting by engineers.

We filtered out the metric causing the issue and rolled the affected instances to overcome the issue.

How the service was affected

Affected customers (a random subset of instances across tiers) could not retrieve any instance metrics via the endpoint customer-metrics-api.neo4j.io and this also affected the built-in metrics included in the monitoring section of the Aura console (console.neo4j.io )

What we are doing now

Following a review of the sequence of events and their impact, we have identified a number of actions to implement so that we improve the Neo4j Aura and prevent, detect, mitigate as well as better handle any similar issue.

  • Prevention

    • Improve metrics endpoint robustness: requests should not fail if 1 metric is invalid 
    • Fix and prevent negative counters for metrics and log occurrences. 
  • Detection

    • Implement an alert on metrics not being collected successfully 
  • Mitigation, handling and troubleshooting

    • Improve access to raw metrics from any instances 
    • Provide a configuration option to exclude a metric 
  • Communication

    • Represent the metric endpoint on the status page and work towards automating the report of its status
Posted Jun 04, 2025 - 15:54 UTC

Resolved

The fix implemented by our engineers has resolved the issue. Users can once again use Aura Metrics
Posted May 07, 2025 - 21:24 UTC

Monitoring

We have rolled out the fix and the metrics are available again. We will monitor for some time.
Posted May 07, 2025 - 16:47 UTC

Update

We have identified the issue and have a change that we will rollout shortly.
Posted May 07, 2025 - 15:40 UTC

Identified

We have identified the issue and have a change that we will rollout shortly.
Posted May 07, 2025 - 13:43 UTC

Update

We are still investigating the issue affecting the availability of metrics. As a result, you may experience difficulties monitoring your instances or accessing metric data. We appreciate your patience as we work to resolve it.
Posted May 07, 2025 - 10:09 UTC

Update

We have identified an issue impacting the availability of metrics on instances across all tiers. Our team is currently investigating the root cause. In the meantime, you may experience difficulties monitoring your instances or accessing metric data.
Posted May 07, 2025 - 08:20 UTC

Investigating

We have identified an issue impacting the availability of metrics on instances across all tiers. Our team is currently investigating the root cause. In the meantime, you may experience difficulties monitoring your instances or accessing metric data.
Posted May 07, 2025 - 08:18 UTC
This incident affected: AuraDB Virtual Dedicated Cloud (*.databases.neo4j.io) (AuraDB Virtual Dedicated Cloud on AWS (*.databases.neo4j.io), AuraDB Virtual Dedicated Cloud on Azure (*.databases.neo4j.io), AuraDB Virtual Dedicated Cloud on GCP (*.databases.neo4j.io)), AuraDB Professional (*.databases.neo4j.io) (AuraDB Professional on AWS (*.databases.neo4j.io), AuraDB Professional on Azure (*.databases.neo4j.io), AuraDB Professional on GCP (*.databases.neo4j.io)), AuraDS (*.databases.neo4j.io) (AuraDS on AWS (*.databases.neo4j.io), AuraDS on Azure (*.databases.neo4j.io), AuraDS on GCP (*.databases.neo4j.io)), AuraDS Enterprise (*.databases.neo4j.io) (AuraDS Enterprise on AWS (*.databases.neo4j.io), AuraDS Enterprise on Azure (*.databases.neo4j.io), AuraDS Enterprise on GCP (*.databases.neo4j.io)), AuraDB Business Critical (*.databases.neo4j.io) (AuraDB Business Critical (*.databases.neo4j.io) on AWS, AuraDB Business Critical (*.databases.neo4j.io) on Azure, AuraDB Business Critical (*.databases.neo4j.io) on GCP), and AuraDB Free (*.databases.neo4j.io).