[swift-dev] Measuring MEAN Performance (was: Questions about Swift-CI)
mgottesman at apple.com
Tue Jun 13 12:11:52 CDT 2017
So I did a bit more research. Check out how LNT does this:
I talked with Chris Matthews (+CC) about how LNT uses Mann-Whitney. In the following let n be the number of samples taken. From what he told me this is what LNT does:
1. If n is < 5, then some sort of computation around confidence intervals is used.
2. If the number of samples is > 5, then Mann-Whitney U is done.
I am not 100% sure what 1 is, but I think it has to do with some sort of quartile measurements. I.e. Find the median of the new data and make sure it is within +- median absolute deviation (basically mean + std-dev but more robust to errors). I believe the code is in LNT so we can find it for sure.
Thus in my mind the natural experiment here in terms of Mann-Whitney U.
1. This seems to suggest that for small numbers we do some sort of simple comparison that we do today and if a regression is "identified", we grab more samples of before/after and run mann-whitney u.
2. Try out different versions of n. I am not 100% sure if 5 is the right or wrong answer or if it should be dependent on the test.
Chris, did I get it right?
> On Jun 13, 2017, at 7:11 AM, Pavol Vaskovic via swift-dev <swift-dev at swift.org> wrote:
> On Tue, Jun 13, 2017 at 8:51 AM, Andrew Trick <atrick at apple.com <mailto:atrick at apple.com>> wrote:
> I’m confused though because I thought we agreed that all samples need to run with exactly the same number of iterations. So, there would be one short run to find the desired `num_iters` for each benchmark, then each subsequent invocation of the benchmark harness would be handed `num_iters` as input.
> That was agreed on in the discussion about measuring memory consumption (PR 8793) <https://github.com/apple/swift/pull/8793#issuecomment-297834790>. MAX_RSS was variable between runs, due to dynamic `num_iters` adjustment inside `DriverUtils` to fit the ~1s budget.
> This could work for keeping the num_iters same during comparison between the [master] and [branch], give we logged the num_iters from [master] and used them to drive [branch] MAX_RSS memory. I don't know how to extend this to make memory consumption comparable between different measurement runs (over time...), tough.
> swift-dev mailing list
> swift-dev at swift.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the swift-dev