Build delays in our OS X infrastructure
Incident Report for Travis CI
Resolved
The backlog for .org OS X builds has cleared. 👍🎉 Resolving incident.
Posted Dec 03, 2016 - 05:31 UTC
Update
The backlog on travis-ci.com has cleared, we're still monitoring the backlog on travis-ci.org.
Posted Dec 02, 2016 - 18:18 UTC
Update
We're continuing to process through the backlog without issues, but due to the size of the backlog there are still long delays for builds to run. We're continuously monitoring the situation in order to maintain the best possible throughput.

We're truly sorry for this lengthy interruption to your builds and we will be publishing a postmortem next week.
Posted Dec 02, 2016 - 17:11 UTC
Update
We are beginning to process backlog at full capacity and continue to monitor closely.
Posted Dec 02, 2016 - 12:03 UTC
Monitoring
We are slowly starting up capacity and resuming builds.
Posted Dec 02, 2016 - 11:29 UTC
Identified
We identified an issue that is preventing build VMs form booting and are working on a fix.
Posted Dec 02, 2016 - 11:14 UTC
Investigating
We are experiencing some issues while restoring OSX capacity and are investigating.
Posted Dec 02, 2016 - 10:48 UTC
Monitoring
We are restarting the workers for our .org open source OS X builds and monitoring to ensure the system is stable.
Posted Dec 02, 2016 - 09:46 UTC
Update
Our Infrastructure provider is placing our hypervisor host under maintenance while they perform additional clean-ups.
Posted Dec 02, 2016 - 06:46 UTC
Update
Our infrastructure provider is doing a rolling restart of all our physical hardware.
Posted Dec 02, 2016 - 06:09 UTC
Investigating
Our infrastructure provider is resolving an issue with Hypervisor hosts. All OS X Builds are stopped while they take down hosts, reload, and migrate.
Posted Dec 02, 2016 - 05:04 UTC
Monitoring
OS X builds are running at full capacity for .com and .org. Monitoring.
Posted Dec 02, 2016 - 04:45 UTC
Update
Host hypervisors have been restarted, OS X builds are resuming first for .com, then .org
Posted Dec 02, 2016 - 04:27 UTC
Identified
We've stopped all OS X workers processing jobs for .com and .org while we stabilize our hypervisor hosts
Posted Dec 02, 2016 - 03:21 UTC
Update
OS X .org workers have restarted and resuming jobs at full capacity.
Posted Dec 02, 2016 - 02:40 UTC
Update
We are restarting the workers for our .org open source OS X builds. Please stand by as jobs will resume shortly after.
Posted Dec 02, 2016 - 02:06 UTC
Update
OS X .com backlog has cleared. Still processing open source OS X builds.
Posted Nov 30, 2016 - 23:54 UTC
Update
Resumed OS X builds at full capacity, monitoring performance.
Posted Nov 30, 2016 - 18:15 UTC
Monitoring
We have resumed partial OS X builds at reduced capacity, and placing some of our build vm hosts in maintenance mode while we continue to monitor performance issues.
Posted Nov 30, 2016 - 17:37 UTC
Update
Temporary stoppage on .org and .com OS X builds to drain VM pool again, after which we will restart the flow of jobs at reduced capacity
Posted Nov 30, 2016 - 16:45 UTC
Investigating
We are re-escalating this incident as we are experiencing high load on our hypervisor server, which is affecting all OS X build vms for .org and .com
Posted Nov 30, 2016 - 15:50 UTC
Update
Public repos are back to full capacity. We are working through the accumulated backlog.
Posted Nov 30, 2016 - 06:12 UTC
Update
The private repo backlog has cleared. We are continuing to run public repos at reduced capacity.
Posted Nov 30, 2016 - 04:21 UTC
Update
We have returned to full capacity for private repos.
Posted Nov 30, 2016 - 03:28 UTC
Update
We are still observing VM leakage, albeit very slight. We are going to remain at half capacity and reassess in 1 hour.
Posted Nov 30, 2016 - 02:44 UTC
Monitoring
We have restarted jobs at half capacity and are monitoring VM lifecycle metrics.
Posted Nov 30, 2016 - 01:45 UTC
Update
We have stopped all jobs in order to drain the VM pool again, after which we will restart the flow of jobs at reduced capacity while monitoring VM leakage.
Posted Nov 30, 2016 - 00:54 UTC
Identified
We are re-escalating this incident given ongoing issues with VM lifecycle management.
Posted Nov 30, 2016 - 00:37 UTC
Monitoring
We have resumed all OS X builds on .org and .com at full capacity, and will start to work through the backlog while we monitor the recent changes.
Posted Nov 29, 2016 - 20:04 UTC
Update
Temporary stoppage on .org and .com OS X builds. We're pushing fixes to our VM cloud manager and hypervisor client, and will need to restart these services.
Posted Nov 29, 2016 - 19:40 UTC
Identified
OS X builds for .org and .com are processing at 50% capacity while we continue to debug our VM cloud manager issues.
Posted Nov 29, 2016 - 17:49 UTC
Monitoring
All the OS X workers are back online. Jobs will be delayed until we’re able to catch up through the backlog of jobs waiting to be built.
Posted Nov 29, 2016 - 17:23 UTC
Identified
We found an issue with our virtual machine cloud manager. We’ve now brought back up part of the workers and we’re slowly resuming OS X jobs.
Posted Nov 29, 2016 - 17:02 UTC
Update
We’ve stopped all OS X builds to investigate further and are working on restoring them as quickly as possible.
Posted Nov 29, 2016 - 16:39 UTC
Investigating
We’re experiencing issues booting OS X jobs. This is causing severe build delays in our OS X infrastructure. We are currently investigating and will post an update as soon as we know some more.
Posted Nov 29, 2016 - 16:30 UTC