Customers started to report errors accessing the Console and the Aura API on 2023-12-28 at 13:50 UTC. An incident was raised and our engineering teams identified requests to GCP Datastore timing out as the cause of the unavailability. We were particularly affected due to our usage of the python Datastore drivers version selected. We raised a support ticket with our cloud service provider (GCP) and in the meantime our SREs identified the issue and mitigated it. Service availability was restored around 2023-12-28 at 17:30 UTC.
Both the Aura Console and API make use of the GCP Datastore service for user management.
Console access authentication was successful but loading the Aura tenant information was affected and blocked the display of the Aura Console UI. Aura API also operates at tenant level and was impacted. Requests started timing out causing the unavailability.
We have taken steps to update our GCP Datastore’s driver version according to GCP’s recommendation as well as making sure we better handle an outage and timeout on some queries to prevent blocking. We will also implement a circuit breaker in our logic to reduce the impact of an outage. Finally we will be looking into improving our detection and alerting in case of a GCP Datastore service outage.