Issues reaching Australian diaries
Incident Report for ResDiary
Postmortem

All the times in this report are in BST.

  • At 18:46 on 11/09/2019 updates were installed on one of our Australian diary application servers, triggering an automatic reboot.
  • At 05:22 on 12/09/2019, another one of the servers were automatically rebooted after having updates installed.
  • Unfortunately the web servers didn't automatically start after the reboot, causing reduced capacity.
  • The remaining servers were able to cope with the traffic until around 07:45, at which point a backlog of requests started to build up.
  • At 08:14 engineers were notified by automatic alerts that there was a problem with Australian diaries and started to investigate.
  • By 08:28, the system had become completely unable to cope with the traffic being received, and most users would have had problems accessing the diary.
  • At 08:40, our engineers realised that the web servers on two of our servers were not running, and started them. This resolved the problem.

As a result of this incident, we have added additional alerts to notify us immediately when any of our web servers become unavailable, and have put new procedures in place to make sure we can resolve problems like this before they start causing problems for customers.

We apologise for any inconvenience caused, and will continue to improve our processes to try to prevent incidents like this from occurring.

Posted Sep 25, 2019 - 09:53 UTC

Resolved
After continuing to monitor the system, we are confident the incident has been resolved. We will provide a post-mortem containing more details of what happened shortly
Posted Sep 12, 2019 - 08:38 UTC
Identified
Engineers have identified the problem and implemented a fix. We will continue to monitor and will provide an update once we are sure the situation has been resolved.
Posted Sep 12, 2019 - 07:48 UTC
Investigating
We are currently experiencing issues with the servers that host our Australian diaries (au.resdiary.com). Engineers are investigating and will provide an update shortly.
Posted Sep 12, 2019 - 07:37 UTC
This incident affected: ResDiary Application (Australia).