Customer Metrics Integration (CMI) unavailable
Incident Report for Neo4j Aura
Postmortem

What happened

On 2025-01-15 at 15:42:02 UTC our secure endpoint to provide Customer Metrics Integration became unavailable and returned the error {"message":"Failed to validate JWT."} due to the expiry of the SSL certificate. Whilst we had renewed the certificate our framework for deploying and rolling out components had updated service accounts but not updated the associated service key secret in the correct sequence. 

How the service was affected

Customers collecting and ingesting metrics from their Neo4j Aura instances, were no longer able to do so as a result of connectivity issues with the provided endpoints. The issue was due to the requirement for valid encryption (and an up to date certificate) to connect to customer-metrics-api.neo4j.io.

We detected it internally when rolling out an update and soon after received reports of issues from customers. We worked to create a new service account and associated secret key to be rolled out immediately.

What we are doing now

We recognise that this issue caused serious issues in monitoring and operating Neo4j Aura instances and we have committed to the following actions:

  • Monitoring: We built additional monitoring metrics and dashboards and derived alarms to our cloud operations team to detect issues with failed connection
  • Mitigation: We are improving how we roll these changes related to service accounts secrets key.
  • Prevention: For any service account secret key deletion and updating with a new one we will not bundle this work anymore but split the tasks associated with the changes.
Posted Feb 07, 2025 - 11:47 UTC

Resolved
A fix has been in place for some time and this incident is considered resolved. A portmortem will be forthcoming.
Posted Jan 15, 2025 - 22:17 UTC
Monitoring
A fix has been rolled out and we are monitoring to ensure CMI is fully operational for all instances.
Posted Jan 15, 2025 - 19:39 UTC
Identified
The Neo4j Aura Customer Metrics Integration (CMI) is currently unavailable. We have identified a fix and are preparing to roll it out.
Posted Jan 15, 2025 - 18:20 UTC
This incident affected: AuraDB Virtual Dedicated Cloud (*.databases.neo4j.io) (AuraDB Virtual Dedicated Cloud on AWS (*.databases.neo4j.io), AuraDB Virtual Dedicated Cloud on Azure (*.databases.neo4j.io), AuraDB Virtual Dedicated Cloud on GCP (*.databases.neo4j.io)), AuraDS Enterprise (*.databases.neo4j.io) (AuraDS Enterprise on AWS (*.databases.neo4j.io), AuraDS Enterprise on Azure (*.databases.neo4j.io), AuraDS Enterprise on GCP (*.databases.neo4j.io)), and AuraDB Business Critical (*.databases.neo4j.io) (AuraDB Business Critical (*.databases.neo4j.io) on AWS, AuraDB Business Critical (*.databases.neo4j.io) on Azure, AuraDB Business Critical (*.databases.neo4j.io) on GCP).