ResNet Server Infrastructure Problems – 24th March, RESOLVED

Part of the infrastructure which runs the ResNet back end has developed a hardware fault, some of the services it’s running are currently in a degraded state. We need to move services off it asap.

The following services are affected:

– http://go.resnet.bris.ac.uk – will be unavailable for up to an hour while we move it
– http://my.resnet.bris.ac.uk – will be unavailable for up to an hour while we move it
– DNS (ResNet Wired) will be in a degraded state, and web browsing for 50% of currently connected ResNet customers will be slower than usual
– DHCP (ResNet Wired) will be working, in a non-resilient state
– The ResNet channel in portal.bris.ac.uk will be unavailable for up to an hour while we move it

Various other administrative tools (eg the troubleshooting system used by the Service Desk when troubleshooting ResNet connection problems) will also be affected.

Important: Wireless is not affected at all, so if you’re experiencing wired connection issues – try the wireless.

Updates will follow as we get them.

Update 24th March, 16:15
– http://go.resnet.bris.ac.uk – is now available again
– http://my.resnet.bris.ac.uk – is now available again
– The ResNet channel in portal.bris.ac.uk is now available again
– DNS/DHCP (ResNet Wired) – This is now back in service in a non-resilient manner (both the primary and secondary are on the same hypervisor) which will get us through tonight but isn’t ideal in the long term.

Update 24th March, 16:56
We’ve started to move people onto a new pair of DNS servers to improve resilience and reduce the load on the non-resilient hypervisor. Early indications are that the new servers are performing nicely and that people are moving across to them smoothly as their DHCP lease renews.

Update 25th March, 10:51
The temporary hardware we used to restore service yesterday is overcommitted and while it’s working, it’s at the limit of what it can handle. We will be moving services off it today onto a more permanent home, and we’ll try to minimise downtime while we do so.

Update 26th March, 09:28
Some services are taking substantially longer than expected to migrate. We kicked off the migration of 3 VMs last night, and they’ve still not fully completed. The following services are currently unavailable:
– http://go.resnet.bris.ac.uk – (still migrating)
– http://my.resnet.bris.ac.uk – (still migrating)
– http://www.resnet.bris.ac.uk redirects (the content is still available at http://www.bristol.ac.uk/it-services/advice/homeusers/resnet)

Update 26th March, 11:00
Everything is back up on new hardware.

css.php