
Internet Solutions (IS) admits it experienced a reboot on one of its mail clusters, on 8 February, but says the problem was sorted out in minutes.
Users who were affected experienced a complete mail failure in the initial two hours of the outage, he says.
The innovation and technology manager at IS, Hayden Lamberti, says when the cluster rebooted, it was fully operational within 15 minutes. “However, one of the mail stores was corrupted. This mail store had 178 - less than 0.5% of our user base on hosted exchange - users spread across our clients,” he adds.
Lamberti says they had two ways to solve this problem. “The first option was to rebuild the mailboxes from the backups; this was to take nine hours to complete.”
With this option, he says, users would have had no mail via the desktop client, the Webmail interface or via their mobile phones during the duration of the rebuild.
The other option was to recreate the mailboxes from scratch; this would entail creating new mailboxes. “The users would have been able to send and receive mail while we rebuilt the inbox into a restore folder within the hosted exchange inbox,” points out Lamberti.
He says the disadvantage of this would be that the end-users would lose their Outlook Explorer.
Lamberti explains that IS started with the first option, then opted for creating new mailboxes due to high call volumes from clients. “All users had the ability to send and receive mail once we recreated the mailboxes two hours later.”
Problem solved
The company assures that the e-mails were in working order by 7pm the same day but some users has service restored more quickly than others. “The mail was restored to the new inboxes one by one.” Lamberti points out.
“So while the last one was restored at 7pm the same day, others were done much sooner, starting at 10am.”
Once the restore was complete, end-users would then be trying to sync mail to the outlook client over connectivity, it says. “This is not geared to push this amount of data in a short period of time, thus users experienced large delays in synchronising the mailbox via Outlook and via the mobile devices.”
Lamberti says this highlights a key error whereby users are using large mailboxes as a mechanism for archiving and storage.
“They should use tailor made archiving solutions to ensure that compliance risk is dealt with in a way that does not impact user experience and system usability in crisis outage situations.”
Lamberti says IS understands that e-mail is a critical business tool for all its clients and as such take its e-mail environments very seriously.
“We understand that managing infrastructure like this is subject to failures outside of our control and in these instances ensure we have the capability in the form of processes and procedures to handle whatever happens in these environments.”
These includes backup of mail, backup of configurations, standby hardware and even failover data centre environments, he says.
Related story:
E-mail failure hits IS clients
Share