BUIP052: (passed) Set up dedicated Continuous Integration

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
Summary

Get a paid-for dedicated Continuous Integration (CI) service to do daily / nightly builds which cover the full test suite.

This would be an addition to the existing free service provided by Travis.

Motivation

Right now, BU is running on free instance of Travis CI [1], which is a great service for open source projects.
However, this is limited in the build minutes, which means that BU cannot run the full suite of software tests (which can take up to a few hours to run through depending on hardware).

Instead, it runs a limited subset of quicker tests (a kind of "smoke testing").

Practically, individual developers are limited in the build platforms available to them, and do not always run the extended tests, which results in a lack of awareness of whether changes have broken some parts of the software on some supported platforms.

Indeed, it is difficult if not impossible to currently obtain an accurate picture of the overall project test status (all applicable tests on supported platforms). A dedicated CI service could run the full test suite once a day (testing 'dev' and 'release' branch heads for various build platforms).


Benefits
  • Contributing developers need information on how reliably a test is expected to perform - when you develop a feature and the test fails, you want to know whether the test was occasionally known to fail before you started, or whether it's likely due to changes you've made, etc.
  • Acts as a 'health check' for users who want to run BU. They can look at the test status to gauge whether they are satisfied with the software's health and prepared to take the risk of running it.
  • Helps the project obtain better quality by making sure they don't break the software accidentally (daily feedback on merged code 'dev' and 'release' branches).

Implementation


This BUIP does not intend to prescribe a particular service or provider, but strongly recommends going with an existing, well known service with good reputation and public interface.

A 'Startup' Travis instance (described as 'best for small teams') costs $129 / month, i.e. ~$1548 / year, and allows 2 concurrent jobs and unlimited build minutes / repositories / collaborators.

A 'Small Business' Travis instance (described as 'best for growing teams') costs $249 / month, i.e. ~$2988 / year, and allows 5 concurrent jobs and unlimited build minutes / repositories / collaborators.

$3K / year for reliable full-spectrum daily test status information accessible to all project members would be a good investment in the author's view.


Notes
  • The existing free Travis service which we are using to test PRs and merge commits should remain in operation unchanged by this BUIP (this is a recommendation). The free service is useful as-is, a lot of effort has been invested and there is no major benefit to changing it / moving everything to another service.
  • The daily / nightly tests can also be run quite differently, requiring a different CI configuration. For example, for tests that fail, one could re-run them several times to gauge whether it's a random failure and what the probability of failure is. One could also set up repetitions of passing tests with different execution orders, various random seeds etc. There's a lot that can be done to get more mileage out of the existing tests.
  • The author is unaware of Bitcoin Core or other Satoshi-based clients publishing full test suite run data to the public. This is certainly not optimal, and a good chance for BU to improve upon current practice and provide a good example.

Optional Item 1: upgrade existing Travis for speedier builds

As mentioned by @solex in the bitco.in forum thread [2], Travis has made an offer of upgrading the standard CI service that BU is currently, giving developers access to faster build times for the regular builds. This is offered at $2750 p.a.


Alternatives

Developers could set up dedicated test boxes at home / work and feed back test results into a common Git repository.

This would lack the nice interface etc., and would be more costly overall as people might have to acquire dedicated test hardware (although some of this could be obtained through sponsorship or donations).

One could make a separate BUIP for a project to aggregate this "ragtag" test data into something that's easily digestible for project members. I'm assuming this would end up costing more and not performing as well as using an experienced CI provider. The advantage is that it would be more decentralized.

Additional information

[1] https://travis-ci.com/plans
[2] https://bitco.in/forum/threads/buip052-set-up-dedicated-continuous-integration.2034/
 
Last edited:

sickpig

Active Member
Aug 28, 2015
926
2,541
@freetrader

this is a great proposal.

Having the full python test suites run once a day for both branches (`dev` and `release`) is definitely something valuable.

A few thoughts on the details:

- the free plan for open source projects provided by travis has a max build time of 50 mins, all the paying-for plans has a limit of 120 minutes, that means we should have probably to split the build matrix even in this case.

- in terms of testing on different platforms what Travis is providing is not a native build but a cross compile service. This means that ARM, Apple, Win are cross compiled on travis servers that are actually running an Ubuntu based x86-64 machines. This has 2 limitations: a) `make check` tests could be run only for Linux 32 and 64 bit and Windows (via wine emulator) b) some kind of build errors could happen only while compiling natively.

The idea of having developers actually testing the full test suite on their own hardware is a good one and could avoid the 2 issues I've just mentioned. Still it would be more effective if coordinated in automated fashion, what I've in mind is something like the buildfarm concept PostgreSQL projcet is using, see https://buildfarm.postgresql.org/cgi-bin/show_status.pl

Basically the idea is having a website where to collect all the results of the farm and log all success and failed builds. Each machine belonging to the build farm will use a tailored perl script to build and run the regression tests. Once the process is finished or aborted due to errors all the log and the status will be forwarded to the main collector server and then categorized to be accessed via a website.

To make a long short, in the near term having travis running the full test suite once a day for each active branch is the way to go. We need to be aware of the limitations thou (no native build and limited possibility of running the functional tests suite).

In the mid/long term I wonder if we could set up something similar to the build farm postgresql is using.
 

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
@sickpig, thanks for the great feedback.

About the 120 minute limit, then Travis' website is definitely providing too little information / misleading advertising with their "unlimited build minutes" claims. Sad! Not sure if e.g. `pruning.py` would be feasible. On my system it takes several hours (I've never had it pass unfortunately).

But of course there could be other self-hosted options like Jenkins instances. A Travis job could probably be made to farm out extremely long tasks to other servers?

We should probably emphasize testing of the builds we ship as official, even if they are cross-compiled. It's my understanding (please correct me) that what we ship are currently those cross-compiled products.

The native builds add an extra dimension which is definitely useful to test though.

> In the mid/long term I wonder if we could set up something similar to the build farm postgresql is using.

This sounds like a good idea, although a lot of work. Maybe an effort that could be tackled in cooperation with other projects that are willing to contribute to a shared test infrastructure?
 

deadalnix

Active Member
Sep 18, 2016
115
196
I would be against in this form. Let me explain.

The length of an iteration cycle is key to productivity. Reducing the productivity itself reduce the ability to fix problems so that a losing strategy in the mid term. Making CI faster is always good, but, at the end, you don't want to run slow tests at the PR level, because the cost in term of iteration outweigh the benefits.

The solution to this problem is 2 stage testing (ideally 3, but the first stage is costly to put in place, so 2 will do).

You want to have a bot checking out master, running the whole battery of tests again and again. When a test fails, the bot can then bissect that test specifically and assign the blame. The faulty diff can be patched if a solution is known and author available, it is reverted otherwise.
 

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
you don't want to run slow tests at the PR level
@deadalnix , I think you misunderstood this proposal.

This is about stage 2, not the PR stage.
It is about running the whole battery of tests at least once a day (preferably more, as often as we can).

I don't know how I could have made it much clearer, but I've now bolded the 3 times I mentioned it in the document text.

The bisection is something I didn't think about adding in this proposal, but it would be very neat to have (who enjoys running dissections manually :eek: ?)

The stage 1 cycle for PRs and merge commits should remain unchanged for now.
 
Last edited:
  • Like
Reactions: solex

deadalnix

Active Member
Sep 18, 2016
115
196
Ok I was mislead by the fact you used the same vocable for this and usual, per PR testing. I don't think this require travis per se, but if people think it is more appropriate then be it. We can have a machine that checkout master/run tests/report for cheap. I have a plan that I'll submit to BU soon containing various actionable items to improve devops, and this is one of them.

Maybe you want to edit the first post a bit to make it clearer that you are talking about running this on master and not on each PR.

For completeness, the one extra stage one can add for testing is at the PR creation point. You can run a selected set of tests based on the code that were changed and the test covering that code. Doing so, you can run 10 tests and catch 80+% of error before a PR is even created. But that's significantly harder to put in place so we can do without for now.
 
  • Like
Reactions: drwasho and solex

solex

Moderator
Staff member
Aug 22, 2015
1,558
4,693
Helpful initiative. I think $3k per year is good value for the enhanced testing as described.
(renamed to BUIP051 for indexing purposes).
 
  • Like
Reactions: drwasho

solex

Moderator
Staff member
Aug 22, 2015
1,558
4,693
Travis gave us a free trial of a better service last week, and it gave us significantly faster build times. They quote $2750 per annum for this service level, and I suggest that this be added to the BUIP for membership approval.
 

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
I want to point out that this proposal is not so much about upgrading our existing Travis service which is used for building PRs and merge commits, but about obtaining an additional, dedicated service which can be used to run the test suite in very different ways (starting with full runs of all tests).

This does not seem possible with an upgrade of the current Travis service - at least I have not heard anyone explain how this could be technically possible.
The current source distribution contains some Travis definition files which are tailored to the standard quick checks. I did not see any way in Travis to override this. The 'cron' method would also use these standard definitions AFAIK. I am keen to hear anyone who might know how to work around this Travis limitation.

Though perhaps I'm misunderstanding the suggestion here. I would be happy to add the build speed upgrade offer from Travis to this BUIP as an additional separate option that can be voted on. As a developer I am of course always in favor of things that speed up the build ;-)
 
Last edited:

solex

Moderator
Staff member
Aug 22, 2015
1,558
4,693
Indeed. I see that, however, the Travis free service is proving inadequate, and this BUIP does give due recognition to the existing dependency we have on them, and that is not likely to be superseded. So, it makes a good fit to have the BUIP vote include an upgrade to the Travis small business level (at a slightly discounted rate), as well as authorise the additional service. Do we need to put an cost estimate on the latter?
 

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
I've included it as Optional item 1.

More details on the significantly faster build times would be great. I'm assuming that because it's a paid plan, it then allows builds to last up to 120 minutes?