We've been monitoring our system for a number of hours and things are now stable. Thanks again for your patience over the last few days.
Posted about 1 year ago. Mar 02, 2017 - 09:49 UTC
A fix has been implemented and we are monitoring the results.
Posted about 1 year ago. Mar 02, 2017 - 04:28 UTC
We've resumed all build processing at this point. Builds are starting and running as expected. Logs display via the API and web UI is functional as well. We will be monitoring things closely for the next few hours and into tomorrow. Thank you to everyone for your patience, understanding, and the many kind words via Twitter.
Posted about 1 year ago. Mar 02, 2017 - 02:36 UTC
The database work is done. We are in the process of resuming services and beginning to process jobs again. We're still verifying things and will post another update once we're confident jobs should be being processed as expected.
Posted about 1 year ago. Mar 02, 2017 - 02:20 UTC
Our database provider has asked to make some changes to the existing primary logs DB that require we stop processing new jobs temporarily.
So all builds will be paused and logs display will result in an error from the API or web UI. We'll post an update once we've resumed builds.
Posted about 1 year ago. Mar 02, 2017 - 01:48 UTC
We are currently waiting on a new replica logs database to finish provisioning and we plan to fail over to it once it is ready, which we expect to happen roughly 5 hours.
Until then delays in log displays and some errors from the API/web UI should be expected. We are sorry for the extended length of this issue and appreciate your patience while we work through this issue with our database infrastructure provider.
Posted about 1 year ago. Mar 02, 2017 - 01:07 UTC
We are still working on a fix with our infrastructure provider.
Posted about 1 year ago. Mar 01, 2017 - 21:41 UTC
We're currently mostly stable, and we're actively working with our infrastructure provider on a more complete fix. Thanks for hanging in there with us!
Posted about 1 year ago. Mar 01, 2017 - 20:14 UTC
We have found a way to mitigate our degraded API performance in the short term. We continue to monitor performance and wait for the emergency failover database to provision. We are still experiencing a delay of logs in our web front end and will report back as soon as we can.
Posted about 1 year ago. Mar 01, 2017 - 15:52 UTC
Our ongoing database connection issues are due to emergency maintenance following the recent AWS outage. We are working with our upstream provider to rectify a kernel bug and are currently waiting for a new database failover to be provisioned. We expect this to take some time, and will continue to post updates as we have them.
Posted about 1 year ago. Mar 01, 2017 - 14:48 UTC
We have traced the partial outage to an intermittent database connection issue, and we're working to resolve it.
Posted about 1 year ago. Mar 01, 2017 - 11:53 UTC
We are experiencing a partial API outage on travis-ci.org, which is affecting performance of our web front end.