Intermittent Connectivity Issues
Incident Report for ResDiary
Postmortem

All times in this report are in BST.

Between around 10:00 and 17:00 customers faced intermittent issues accessing a number of our services, including diaries and widgets. Each time this happened, the affected sites became available again within a minute or two.

After investigation, we have discovered that this issue was caused by network issues at Azure that started at around 10:05 and lasted until around 15:00.

During the incident, we contacted Azure to try to get more information about what was going on. While we were waiting for a response, we decided to investigate possible mitigating measures we could take to reduce customer impact.

At around 13:51 we deployed changes to login.resdiary.com and to api.resdiary.com to try to mitigate the situation. After further monitoring, it appeared that the mitigation was working successfully, so we decided to roll it out further to the rest of our applications.

Unfortunately the measures we took may have caused some customers issues with accessing www.resdiary.com, Classic Widgets in Southeast Asia and North Central USA, and Diaries in East Asia. We sincerely apologise for these problems, but at the time we had to make a decision since we were unsure how long it would take Azure to solve their network problems.

We will investigate whether we can avoid customer impact in future when taking this kind of mitigating action.

Further information about the Azure issue can be found at https://azure.microsoft.com/en-gb/status/history/ under "Network Connectivity - Increased Latency", dated 13/5.

Posted May 14, 2019 - 10:41 UTC

Resolved
After continuing to monitor the system, we are confident the incident has been resolved. We will provide a post-mortem containing more details of what happened shortly.
Posted May 14, 2019 - 09:27 UTC
Update
Azure have confirmed that they have taken action to solve the DNS issues, so as a result most of our services should be functioning normally now.

Unfortunately some of the mitigating actions we took to resolve the problem may have caused some customers to have issues accessing certain services, including our booking portal, Classic Widgets in Southeast Asia and North Central USA, and Diaries in East Asia. These problems will automatically resolve themselves as our DNS updates propagate.

We will provide a further update once we are certain all of our services are working correctly again.
Posted May 13, 2019 - 15:54 UTC
Update
We have now heard back from Azure support, who have confirmed that there is currently a problem affecting Azure DNS. In the meantime, we have altered some of our services to try to mitigate the DNS problems. We have been monitoring the mitigation for around an hour now, and it appears to have worked successfully. As a result, we are now rolling out the change to the remaining services.

We will continue to monitor and will update this incident as we find out more information.
Posted May 13, 2019 - 13:54 UTC
Update
Temporary DNS issues with Azure caused requests to fail to reach our services for a short period of time. This issue has been occurring intermittently throughout the day for short periods. This does not affect diaries in the UK or Australia but logging into the system may be affected. We have raised an issue with the Azure support team and are awaiting their response.
Posted May 13, 2019 - 12:57 UTC
Monitoring
Temporary DNS issues with Azure caused requests to fail to reach our services for a short period of time. We believe that the issue has been resolved and will raise an issue with the Azure support team.
Posted May 13, 2019 - 09:52 UTC
Investigating
We are currently investigating an issue causing intermittent connectivity issues to our websites. We will provide updates as we find them.
Posted May 13, 2019 - 09:42 UTC
This incident affected: ResDiary Application (UK/Europe, Australia, North Central US, S.E. ASIA, East Asia) and Login, API, Widget Configurator, Dishcult Portal, Sales website.