Adds babysitter script for PR tests
I see two problems with our current automated PR tests:
- Many false negatives— PR tests often fail due to recurring failures
unrelated to the PR
- Information about what failed is awkward to retrieve, scattered in
multiple places
This commit adds a Python script, `scripts/babysitter`. Currently our
Jenkins test suites are each run wrapped with the `timeout` command.
The babysitter acts as a drop-in replacement for GNU timeout but adds
the following features:
- Logs machine readable output about each test suite (as
line-delimited json)
- If the test suite uses NUnit, can detect if a test case failed or
crashed (terminated mono in mid-test), and retry unsuccessful tests
(up to a limit).
The reasoning here is that tests which fail inconsistently are
currently most likely due to one of our outstanding recurring failures
rather than the change made in the PR. Therefore, if a failing test
succeeds on retry, the PR itself is probably valid (although the
failure should be logged and looked at).
In addition to the script itself, changes to NUnit were required in
order to support the retry feature and allow fine-grained logging for
NUnit suites.
Major TODOs:
- Add retry support for our non-NUnit-based test suites
- Save the XML files NUnit produces (since reruns stomp XMLs from
previous runs)
- Add some kind of sensible feature for dealing with timeouts