AU diaries experiencing degraded performance
Incident Report for ResDiary
Postmortem

At around 07:15 GMT automated alerts started firing indicating that the Australian diary application was experiencing performance problems. Unfortunately because of a configuration problem which has now been solved, our on-call engineers did not notice the alert until 07:25.

Our engineers investigated and found that a large backlog of requests had built up and were not being processed in a timely manner. At 07:35 they decided to recycle one of the web servers to allow it to get back into a healthy state. Initially it looked like this might have been enough to solve the problem, but after 10 minutes they decided to recycle the other web servers.

After doing this, all the servers began to process requests normally, and by 07:48 the system was normally again.

Unfortunately the root cause of this problem is not clear at this point in time, but we are continuing to try to get to the bottom of it, and have begun to implement mitigating measures to help prevent the servers getting into the situation where they are overloaded like this.

Update 27/11/2018

After investigating a similar incident that occurred on Thursday 22/11/2018, we believe this incident was triggered by a problem with the backend database server for the Australian diary. See https://status.resdiary.com/incidents/1mktl1r85ps7 for more details of this incident, and the steps we are taking to mitigate this in future.

Posted 3 months ago. Nov 22, 2018 - 09:28 UTC

Resolved
We have been monitoring the AU diary for the past few hours, and the system has been behaving normally since action was taken to resolve the degraded performance. We are now investigating the cause of the problem, and will provide a post-mortem report later.
Posted 3 months ago. Nov 16, 2018 - 10:19 UTC
Monitoring
High database usage on the AU server was causing new requests to be very slow.

The root cause has still to be identified, but we have resolved the issue for now, and will be monitoring the system closely. A full report will follow.
Posted 3 months ago. Nov 16, 2018 - 07:54 UTC
Investigating
The AU diaries are experiencing degraded performance. Engineers are investigating and will provide an update shortly.
Posted 3 months ago. Nov 16, 2018 - 07:34 UTC
This incident affected: ResDiary Application (Australia).