Create a page like thisPowered by FeedvolleyFlag this page

INC #1429 - Cluster.02 Storage Segment 08 Issue

At approximately 1:40 PM PDT, (mt) Engineers resolved an issue with Cluster.02 Storage Segment 08, which required a reboot to resolve. The cause of the issue was determined to be NFS-related. We apologize for the inconvenience caused by this.

This incident has been resolved and closed. If you feel that you are still affected by this issue, please update your Support Request within the AccountCenter.

#

INC #1429 - Cluster.02 Storage Segment 08 Issue

As of 12:40 PM PDT, (mt) Engineers began investigating an issue on Cluster.02 Storage Segment 08 which affected mail access and folder structure issue within the AccountCenter. These issues did not affect the uptime of your service. These issues are related to session and temp data, which cannot be written at this time. (mt) Engineers have hidden the Storage Segment, which will result in all services being unavailable on this Storage Segment for a brief period of time. We apologize for this inconvenience and appreciate your pateince with this matter.

To find the Cluster and Storage Segment for your (gs) Grid-Service, please log into the AccountCenter, click on “Admin” for your affected (gs) Grid-Service, then on “Server Guide”. Here, you can find the Cluster and Storage Segment for your (gs) Grid-Service.

#

#1425 - Services Restored to vzd020

All services have been restored to vzd020 as of 3:45 PM PDT.

After a quick analysis, our engineers confirmed that this was a temporary issue that would be resolved by a reboot of the host machine. After the reboot, system checks were performed and it was determined all services are functioning normally.

#

Services on Host Server vzd020 are Unavailable

As of approximately 3:13 PM PDT, Host Server vzd020 has been experiencing some difficulties. This only affects services on (ve) Servers on physical host machine vzd020.
To see which host server your (ve) Server resides on, please see the Server Guide page in the AccountCenter.

(mt) Engineers are working as quickly as possible to restore all services to this host. Updates to this page will be made as soon as more information is available. We apologize for any inconvenience and we thank you for your patience.

#

#1420 - Incident Review

This post is a summary of Incident #1420, relating to a period of authentication issues with the (gs) Grid Service.

Earlier today, the AccountCenter became unavailable for approximately 15 minutes due to MySQL Replication errors. Soon after, we began receiving reports of failed email and FTP authentication from customers on various Clusters. After some investigation, it was determined that a portion of the account authentication servers, used by each (gs) Grid-Service Cluster, were out of sync. This is the process by which all new password changes are stored and synced across our multi-node, clustered (gs) Grid-Service platform. These servers are replicated database slaves, which are normally self-healing.

(mt) Engineers identified the source of this issue and made the appropriate corrections to restore functionality to these servers. 

  • Date/Time: The issue started at approximately 3:15 PM PDT on Tuesday, July 27, 2010 and was resolved by 6:30 PM PDT. Service impact was variable across the (gs) Grid-Service during this time.
  • Symptoms: Customers creating or modifying email addresses or updating FTP/SSH passwords may have experienced authentication errors.
  • Impact: All (gs) Grid-Service Clusters were affected.  Email was not lost during this time.
  • Root Cause and Takeaways: Although our investigation will be ongoing, we have identified a point where the binary logs that are required for replication were corrupted. Going forward, we are looking into system changes which would help prevent this issue from re-occurring. We will also be looking into increasing the efficiency of our replication repair utilities.  Performing this change will allow us the ability to repair replication services for all (gs) Grid-Service Clusters simultaneously.

This now concludes this System Incident. If you feel that you are still experiencing the symptoms outlined in this post, please open a support request from the (mt) AccountCenter.

#

#1420 - (gs) Grid-Service Replication Services Restored

As of 6:27 PM PDT, all (gs) Grid-Service clusters are operating with replication services fully restored. A full incident review will be published later this evening once we’ve examined the root cause and outlined potential takeaways moving forward.

Once again, we appreciate your patience as we worked to resolve this matter.

#

#1420 - (gs) Grid-Service Cluster.03 Replication Services Restored

As of 5:45 PM PDT, replication services for (gs) Grid-Service Cluster.03 and Cluster.04 have been restored. To recap, Cluster.01, 02, 03 and 04 should be operating normally. We will continue to repair the remaining clusters and update this status page accordingly.

#

#1420 - (gs) Grid-Service Cluster.02 Replication Services Restored

As of 5:20 PM PDT, replication services for (gs) Grid-Service Cluster.01 and Cluster.02 have been restored. Additional work must be done to correct replication on the remaining clusters. As noted before, we will continue updating this status page as replication services normalize for each (gs) Grid-Service cluster. Please note this is not affecting any (dv) Dedicated-Virtual or (ve) Servers at this time.

Thank you for your patience and understanding in this matter.

#

#1420 - (gs) Grid-Service Cluster.01 Replication Services Restored

Shortly after our last update, we received word from our engineering team that replication services for Cluster.01 have been restored. They have now moved on to repairing the rest of the (gs) Grid-Services clusters. To reiterate some common symptoms associated with this incident, you may experience issues logging in with or creating new email/ftp/ssh users. You may also have issues when attempting to update email/ftp/ssh user passwords from within the AccountCenter. This is caused by the replication issue and will be rectified as soon as possible.

Once other (gs) Grid-Service clusters have been repaired, additional updates to this status page will be provided.

#

#1420 - (gs) Grid-Service Email, FTP, SSH Authentication Issues

As of 4:30 PM PDT, (mt) Engineers are still actively investigating this issue. The repair sequence to our replication servers is already underway; Cluster.01 should be normalizing shortly. As each (gs) Grid-Service cluster’s replication service returns to normal, we will update this status page with further information.

#