[swift-dev] Measuring MEAN Performance (was: Questions about Swift-CI)

Michael Gottesman mgottesman at apple.com
Tue Jun 13 12:11:52 CDT 2017

So I did a bit more research. Check out how LNT does this:

https://github.com/llvm-mirror/lnt <https://github.com/llvm-mirror/lnt/search?utf8=%E2%9C%93&q=mann-whitney&type=>

I talked with Chris Matthews (+CC) about how LNT uses Mann-Whitney. In the following let n be the number of samples taken. From what he told me this is what LNT does:

1. If n is < 5, then some sort of computation around confidence intervals is used.
2. If the number of samples is > 5, then Mann-Whitney U is done.

I am not 100% sure what 1 is, but I think it has to do with some sort of quartile measurements. I.e. Find the median of the new data and make sure it is within +- median absolute deviation (basically mean + std-dev but more robust to errors). I believe the code is in LNT so we can find it for sure.

Thus in my mind the natural experiment here in terms of Mann-Whitney U.

1. This seems to suggest that for small numbers we do some sort of simple comparison that we do today and if a regression is "identified", we grab more samples of before/after and run mann-whitney u.
2. Try out different versions of n. I am not 100% sure if 5 is the right or wrong answer or if it should be dependent on the test.

Chris, did I get it right?


> On Jun 13, 2017, at 7:11 AM, Pavol Vaskovic via swift-dev <swift-dev at swift.org> wrote:
> On Tue, Jun 13, 2017 at 8:51 AM, Andrew Trick <atrick at apple.com <mailto:atrick at apple.com>> wrote:
> I’m confused though because I thought we agreed that all samples need to run with exactly the same number of iterations. So, there would be one short run to find the desired `num_iters` for each benchmark, then each subsequent invocation of the benchmark harness would be handed `num_iters` as input.
> That was agreed on in the discussion about measuring memory consumption (PR 8793) <https://github.com/apple/swift/pull/8793#issuecomment-297834790>. MAX_RSS was variable between runs, due to dynamic `num_iters` adjustment inside `DriverUtils` to fit the ~1s budget.
> This could work for keeping the num_iters same during comparison between the [master] and [branch], give we logged the num_iters from [master] and used them to drive [branch] MAX_RSS memory. I don't know how to extend this to make memory consumption comparable between different measurement runs (over time...), tough.
> --Pavol
> _______________________________________________
> swift-dev mailing list
> swift-dev at swift.org
> https://lists.swift.org/mailman/listinfo/swift-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20170613/ce460659/attachment.html>

More information about the swift-dev mailing list