Degraded performance of diary application in the UK/Europe Region

Incident Report for ResDiary

Postmortem

Incident Report

At around 17:35 on Friday the 24th of January we began to receive a large amount of concurrent requests from a venue on our European instance of ResDiary. This influx of requests was not malicious, however the type of request repeatedly being sent was costly for our servers to process and resulted in large request queues.

At 18:23 this instance became unreachable for all clients due to the request queues on each server being flooded.

At around 18:29, the venue confirmed that they had acted to stop the requests being sent to our servers. Our request queue was actively cleared during this time to ensure that we recovered as quickly as possible. Although the app was unreachable for around 5 minutes, you may have noticed a degraded performance for approximately 10 minutes following this outage.

Future Remediation

We’d like to sincerely apologise to any clients affected during this outage. We understand that your service to your own clients was likely affected by this incident and we are taking steps to prevent an incident like this from re-occurring.

We will implement a rate-limiting mechanism to all of our diary instances soon, this will not affect your normal usage of the application but will prevent an outage from being caused by others. Any users sending an unreasonable amount of requests in a short period of time will be limited in order to protect the overall health of the application. We’d like to re-assure you that this will only occur for an extreme load of requests, which was the root cause of this outage.

Posted Jan 27, 2020 - 15:24 UTC

Resolved

The UK / European instances of our diary applications experienced a major outage starting from 18:20 and ending at 18:30.

Posted Jan 24, 2020 - 19:00 UTC