AU diaries experiencing degraded performance

Incident Report for ResDiary

Postmortem

At around 07:15 GMT automated alerts started firing indicating that the Australian diary application was experiencing performance problems. Unfortunately because of a configuration problem which has now been solved, our on-call engineers did not notice the alert until 07:25.

Our engineers investigated and found that a large backlog of requests had built up and were not being processed in a timely manner. At 07:35 they decided to recycle one of the web servers to allow it to get back into a healthy state. Initially it looked like this might have been enough to solve the problem, but after 10 minutes they decided to recycle the other web servers.

After doing this, all the servers began to process requests normally, and by 07:48 the system was normally again.

Unfortunately the root cause of this problem is not clear at this point in time, but we are continuing to try to get to the bottom of it, and have begun to implement mitigating measures to help prevent the servers getting into the situation where they are overloaded like this.

Update 27/11/2018

After investigating a similar incident that occurred on Thursday 22/11/2018, we believe this incident was triggered by a problem with the backend database server for the Australian diary. See https://status.resdiary.com/incidents/1mktl1r85ps7 for more details of this incident, and the steps we are taking to mitigate this in future.

Posted Nov 22, 2018 - 09:28 UTC

Resolved

We have been monitoring the AU diary for the past few hours, and the system has been behaving normally since action was taken to resolve the degraded performance. We are now investigating the cause of the problem, and will provide a post-mortem report later.

Posted Nov 16, 2018 - 10:19 UTC

Monitoring

High database usage on the AU server was causing new requests to be very slow.

The root cause has still to be identified, but we have resolved the issue for now, and will be monitoring the system closely. A full report will follow.

Posted Nov 16, 2018 - 07:54 UTC

Investigating

The AU diaries are experiencing degraded performance. Engineers are investigating and will provide an update shortly.

Posted Nov 16, 2018 - 07:34 UTC

This incident affected: ResDiary Application (Australia).