Aura Professional on GCP in europe-west1 experiencing availability issues
Incident Report for Neo4j Aura
Postmortem

What happened

Half of Aura the Professional environment on GCP in ‘europe-west1’ region experienced availability issues for their instances.

As a result of an Aura component roll out, the Aura database ingress layer for two Professional tier environments: ‘europe-west1’ (GCP) and ‘eastus’ (Azure) were not automatically updating to reflect Neo4j cluster topology changes.

We initially called out only the ‘europe-west1' (GCP) affected environment (we provided a mitigation: use of an unaffected environment in that same region) but missed out 'eastus’ (Azure).

The recovery was to re-establish the connection at the database ingress level by restarting the database ingress pods in order to refresh the Neo4j cluster topology.

How the service was affected

There was an impact for some Aura Professional tier customers who had database instances in the affected environments, this would have been seen as intermittent unavailable during the duration of the roll out.

What we are doing now

We conducted a root cause analysis and we have identified a known issue with the underlying third party component used to implement the database ingress.  

Actions we are taking based on this incident: 

  • Correction - we have implemented and rolled out the recommended fix for the known issue on the third party component.
  • Detection - Introduce logging improvements of existing logs to include items to inform when similar events are occurring and then detecting and alerting accordingly.
Posted Jan 15, 2024 - 12:17 UTC

Resolved
The incident has been fully resolved now and all databases have recovered full functionality.
Posted Jan 04, 2024 - 13:53 UTC
Monitoring
We have issued a fix on our end and the issue is now addressed.
We will keep monitoring a few minutes to confirm.
Posted Jan 04, 2024 - 13:30 UTC
Update
We are continuing to investigate this issue. We have working with the cloud provider.
Affected customers can download a Dump for their database and launch a new instance in a different region (for instance europe-west2 or europe-west3) and load.
We currently have no ETA for a fix but will update regularly.
Posted Jan 04, 2024 - 13:08 UTC
Update
We are continuing to investigate this issue.
Affected customers can download a Dump for their database and launch a new instance in a different region (for instance europe-west2 or europe-west3) and load.
We currently have no ETA for a fix but will update regularly.
Posted Jan 04, 2024 - 12:17 UTC
Update
We are continuing to investigate this issue.
Affected customers can download a Dump for their database and launch a new instance in a different region (for instance europe-west2 or europe-west3) and load.
We currently have no ETA for a fix but will update regularly.
Posted Jan 04, 2024 - 11:32 UTC
Investigating
We are currently investigating an issue with some databases in GCP region europe-west1 becoming intermittently unavailable.
Posted Jan 04, 2024 - 10:46 UTC
This incident affected: AuraDB Professional (*.databases.neo4j.io).