All the times in this report are in BST.
- At 18:46 on 11/09/2019 updates were installed on one of our Australian diary application servers, triggering an automatic reboot.
- At 05:22 on 12/09/2019, another one of the servers were automatically rebooted after having updates installed.
- Unfortunately the web servers didn't automatically start after the reboot, causing reduced capacity.
- The remaining servers were able to cope with the traffic until around 07:45, at which point a backlog of requests started to build up.
- At 08:14 engineers were notified by automatic alerts that there was a problem with Australian diaries and started to investigate.
- By 08:28, the system had become completely unable to cope with the traffic being received, and most users would have had problems accessing the diary.
- At 08:40, our engineers realised that the web servers on two of our servers were not running, and started them. This resolved the problem.
As a result of this incident, we have added additional alerts to notify us immediately when any of our web servers become unavailable, and have put new procedures in place to make sure we can resolve problems like this before they start causing problems for customers.
We apologise for any inconvenience caused, and will continue to improve our processes to try to prevent incidents like this from occurring.