CMI Feature is not responding to requests

Incident Report for Neo4j Aura

Postmortem

What happened

On March 31, 2025, we experienced an incident that temporarily affected the availability of the Customer Metrics Integration service for Aura instances. This issue arose due to a change in our internal systems that inadvertently caused errors in processing customer requests. As a result, customers received error messages instead of the expected metrics. Our team quickly identified and resolved the problem. By rolling back the recent changes, we restored service functionality within 2.5 hours. We are committed to improving our processes to prevent similar issues in the future and ensure a seamless experience for our customers.

Root Cause

We encountered an issue due to the way our services interact with each other. One of the Aura Console API response fields structure changed and resulted in some unexpected dependency problems. We're actively working to improve the reliability of our services to ensure a smoother experience for you in the future.

Customer Impact

  • Customers experienced service disruptions as they were unable to access metrics for their Aura instances.
  • All customer requests returned an HTTP 401 error, preventing users from retrieving necessary information.

Resolution

We rolled back the recent changes to restore service functionality quickly. Following that, implemented some changes in the Console API to reduce dependencies, and redeployed changes successfully.

What we are doing now

Neo4j remains committed to providing reliable service and is implementing additional safeguards to prevent similar incidents in the future.

New mitigations being deployed:

  • Enhancing our incident response playbook to improve handling of similar situations.
  • Establishing a standardized internal Console API with clear versioning and documentation to prevent compatibility issues.
  • Implementing comprehensive testing measures to identify potential issues before they affect our production environment.
Posted Apr 10, 2025 - 21:39 UTC

Resolved

CMI functionality has been fully restored and this issue is considered resolved.
Posted Mar 31, 2025 - 15:36 UTC

Monitoring

A fix has been implemented and the CMI feature is responding normally. We will monitor for a short time before considering this resolved.
Posted Mar 31, 2025 - 14:08 UTC

Investigating

We are currently investigating an issue which impact the CMI feature, where requests return HTTP 401.
Posted Mar 31, 2025 - 13:15 UTC
This incident affected: AuraDB Virtual Dedicated Cloud (*.databases.neo4j.io) (AuraDB Virtual Dedicated Cloud on AWS (*.databases.neo4j.io), AuraDB Virtual Dedicated Cloud on Azure (*.databases.neo4j.io), AuraDB Virtual Dedicated Cloud on GCP (*.databases.neo4j.io)), AuraDS Enterprise (*.databases.neo4j.io) (AuraDS Enterprise on AWS (*.databases.neo4j.io), AuraDS Enterprise on Azure (*.databases.neo4j.io), AuraDS Enterprise on GCP (*.databases.neo4j.io)), and AuraDB Business Critical (*.databases.neo4j.io) (AuraDB Business Critical (*.databases.neo4j.io) on AWS, AuraDB Business Critical (*.databases.neo4j.io) on Azure, AuraDB Business Critical (*.databases.neo4j.io) on GCP).