Since approx 6.30pm tonight we have been experiencing severe problems which are stopping new people registering on ResNet. Existing people up and running are fine, although there are some parts of our website such as the knowledgebase which are inaccessible.
This is due to a fault with our dns server which we are tracing and trying to fix at the moment.
UPDATE 8.30pm: we think we’ve now fixed the DNS problem, however as DNS information takes a while to propogate it will take a while longer (maybe up to an hour) until everything is back to normal.
Sorry to everyone who was trying to register tonight – please try
after 9.30pm tonight or tomorrow.
UPDATED UPDATE – 10:08pm: It’s still somewhat broken. A number of DNS entries haven’t propegated properly, and as such people still can’t register. I’m looking into it.
UPDATED UPDATED UPDATE – 10:48pm: I think I’ve identified the problem, but it’s not something I can fix. With a bit of luck, the chap who can fix it will read the email I’ve just sent him. Sorry everyone!
SAT MORNING 10AM: Everything now seems to be in the DNS that should be – haven’t spotted any problems so far. Thanks James who was fixing stuff at 4 in the morning!
SAT MORNING 1.30PM: It’s not fixed after all! Registrations are failing towards to end of the process with an error
Can’t call method “ReadMAC” on an undefined value at /var/www/register/cgi-bin/activate.account.pl line 607.
We’re trying to fix it. Please try in a few hours or tomorrow.
SAT AFTERNOON 3PM: Looking better – we’ve tweaked things (added all the hubs to our hosts file) and 12 people have now registered in the last half hour. We’re still testing stuff before we conclude that everything is sorted.
SAT EVENING 6:33PM: Looking worse again! The database job that creates the DNS file has started failing in a pretty catestrophic way and is generating empty DNS configs. So we’ve effectively lost 99.9% of our DNS records. I’m working on a fix. -Paul
SAT EVENING 7:37PM: OK, I’ve now managed to get the database job to produce some output. I’ve taken a snapshot of the config files. The job is due to run again in about 3 minutes time, if it fails, I’ll rollback to the snapshot and turn the job off. This will cause problems for anyone who registers after the sanpshot was taken, but at least it’ll get the service back for the other 3800 of you! -Paul
SAT EVENING 8:38PM: Rolling back to the snapshot doesn’t seem to have helped the situation. The config files are all there, but the DNS entries don’t appear to be propagating. I’m still in the office though, and still working on it… -Paul
SAT EVENING 9:15PM: It’s still broken, but I’ve run out of ideas. As far as I can tell, it *should* be working – but isn’t. Hopefully one of the other guys on the team will have more luck. -Paul