We strive to provide the most stable and user-friendly CI platform possible so that you and your teams can focus on shipping amazing open source and commercial software. When any portion of our service is unavailable, we know it can bring your productivity to a screeching halt. As developers building a tool for other developers, we understand firsthand how frustrating and debilitating this can be.
We want to take the time to explain what happened. We recognize that this was a significant disruption to the workflow and productivity of all of our users who rely on us for macOS building and testing. This is not at all acceptable to us. We are very sorry that it happened.
The following is a timeline of the events during this outage.
Note: All times are in UTC timezone.
Sep 08, 2017 - 15:01 UTC: We're currently working with our infrastructure provider to reboot one of our vCenter instances to work out unresponsive SAN issues. Mac OS jobs for public and private repositories builds are stopped.
Sep 08, 2017 - 16:52 UTC: We've rebooted our vCenters and continue to work on stabilizing things. All macOS builds remain stopped.
Sep 08, 2017 - 19:52 UTC: We're continuing to work on getting things into a stable state where we can potentially start running builds. At the moment we do not have an ETA for when we will resume builds. We are very sorry for the delays and will update this incident when we know more. Thank you for your patience.
Sep 08, 2017 - 21:29 UTC: We're working on stabilization cleanup for our SAN storage. At the moment we do not have an ETA for when we will resume builds. We are very sorry for the delays and will update this incident when we know more. Thank you for your patience.
Sep 09, 2017 - 01:04 UTC: In order to help things become stable and reliable going forward, we're undertaking intense cleanup of our SAN filesystem. This cleanup is likely to take all weekend. Because of this, we're only able to resume a portion of our capacity for private builds and will not be resuming shared public builds yet. We do not currently have an ETA for when we'll be able to resume shared public builds. We will provide our next update in the morning PDT. We are very sorry for the delays and will update this incident when we know more. Thank you for your patience.
Sep 09, 2017 - 01:28 UTC: We ran into an issue with booting Xcode 8.x images, so all builds are suspended again. We'll update when private builds are running.
Sep 09, 2017 - 03:13 UTC: We've resumed running private builds at this time. We'll provide further updates on the overall progress tomorrow morning PDT. Thank you for your patience.
Sep 09, 2017 - 13:04 UTC: The backlog for private repository builds has been clear for ~4h. We are planning to bring partial capacity for public repositories back online shortly.
Sep 09, 2017 - 14:32 UTC: Capacity for macOS public repositories has been back online for ~1 hr. We're bumping additional capacity to work through the backlog.
Sep 09, 2017 - 15:11 UTC: We're continuing to process the public backlog while running SAN cleanup. We may still need to reduce or suspend public builds later in the weekend, depending on SAN progress. Thank you for your patience.
Sep 10, 2017 - 16:45 UTC: We've processed a backlog of approximately 9,600 macOS jobs for public repositories since re-enabling public macOS builds at 07:00 PDT yesterday. As we're still at reduced capacity and working on cleaning the SAN, we still have a backlog of ~150-200 jobs and continue to actively process them. We'll provide updates as things progress today. Thank you for your patience.
Sep 10, 2017 - 19:45 UTC: We temporarily have additional reduced capacity for public builds, as we take some actions to continue with our SAN cleanup. We'll provide another update when that capacity has been restored.
Sep 10, 2017 - 19:56 UTC: We're now running with the previous capacity for public builds, which is still reduced from our "normal" capacity. We are continuing with SAN cleanup. We'll provide updates as things progress today. Thank you for your patience.
Sep 11, 2017 - 03:45 UTC: We've completed the first phase of our SAN cleanup. Things are stable and so we're working to resume full public macOS build capacity. We'll provide another update when that's complete.
Sep 11, 2017 - 03:57 UTC: We've resumed full build capacity for public builds. We will be monitoring things overnight and will provide further updates in the morning PDT. Thank you for your patience.
Sep 11, 2017 - 15:47 UTC: We're seeing some instability with some of the private macOS build capacity and so we're reducing capacity temporarily.
Sep 11, 2017 - 16:50 UTC: We're resuming full private macOS build capacity.
Sep 11, 2017 - 19:52 UTC: The backlog has cleared for private builds. We are continuing to monitor the situation for public/open source builds. Thanks for hanging in there with us.
Sep 11, 2017 - 21:11 UTC: The public macOS build backlog has reached normal peak levels and things are remaining stable. We're closing the incident at this time. A postmortem blog post will be published in the next few days and we'll share it on Twitter when it's published. Thank you everyone for your patience and understanding during this extended incident. Incident is Resolved.
The major contributing factors in this outage were
We couldn't be more sorry about this incident and the impact that the build outages and delays had on you, our users and customers. We always use problems like these as an opportunity for us to improve, and this will be no exception.
We thank you for your continued support of Travis CI, we are working hard to make sure we live up to the trust you've placed in us and provide you with an excellent build experience for your open source and private repository builds, as we know that continuous integration and deployment tools we provide you are critical to the productivity of you all.
If you have any questions or concerns that were not addressed in this postmortem, please reach out to us via support@travis-ci.com and we'll do our best to provide you with the answers to your questions or concerns.