diff options
author | Upstream <upstream-import@none> | 1970-01-12 13:46:40 +0000 |
---|---|---|
committer | Upstream <upstream-import@none> | 1970-01-12 13:46:40 +0000 |
commit | d8543bb6618c17b12da906afa77d216f58cf4058 (patch) | |
tree | c58dc05ed86825bd0ef8d305d58c8205106b540f /SunSpider/TODO | |
download | external_webkit-d8543bb6618c17b12da906afa77d216f58cf4058.zip external_webkit-d8543bb6618c17b12da906afa77d216f58cf4058.tar.gz external_webkit-d8543bb6618c17b12da906afa77d216f58cf4058.tar.bz2 |
external/webkit r30707
Diffstat (limited to 'SunSpider/TODO')
-rw-r--r-- | SunSpider/TODO | 70 |
1 files changed, 70 insertions, 0 deletions
diff --git a/SunSpider/TODO b/SunSpider/TODO new file mode 100644 index 0000000..fee4bc0 --- /dev/null +++ b/SunSpider/TODO @@ -0,0 +1,70 @@ + +* Add more test cases. Categories we'd like to cover (with reasonably + real-world tests, preferably not microbenchmarks) include: + + (X marks the ones that are fairly well covered now). + + X math (general) + X bitops + X 3-d (the math bits) + - crypto / encoding + X string processing + - regexps + - date processing + - array processing + - control flow + - function calls / recursion + - object access (unclear if it is possible to make a realistic + benchmark that isolates this) + + I'd specifically like to add all the computer language shootout + tests that Mozilla is using. + +* Normalize tests. Most of the test cases available have a repeat + count of some sort, so the time they take can be tuned. The tests + should be tuned so that each category contributes about the same + total, and so each test in each category contributes about the same + amount. The question is, what implementation should be the baseline? + My current thought is to either pick some specific browser on a + specific platform (IE 7 or Firefox 2 perhaps), or try to target the + average that some set of same-generation release browsers get on + each test. The latter is more work. IE7 is probably a reasonable + normalization target since it is the latest version of the most + popular browser, so results on this benchmark will tell you how much + you have to gain or lose by using a different browser. + +* Instead of using the standard error, the correct way to calculate + a 95% confidence interval for a small sample is the t-test. + <http://en.wikipedia.org/wiki/Student%27s_t-test>. Basically this involves + using values from a 2-tailed t-distribution table instead of 1.96 to + multiply by the error function, a table is available at + <http://www.medcalc.be/manual/t-distribution.php> + +* Add support to compare two different engines (or two builds of the + same engine) interleaved. + +* Add support to compare two existing sets of saved results. + +* Allow repeat count to be controlled from the browser-hosted version + and the WebKitTools wrapper script. + +* Add support to run only a subset of the tests (both command-line and + web versions). + +* Add a profile mode for the command-line version that runs the tests + repeatedly in the same command-line interpreter instance, for ease + of profiling. + +* Make the browser-hosted version prettier, both in general design and + maybe using bar graphs for the output. + +* Make it possible to track change over time and generate a graph per + result showing result and error bar for each version. + +* Hook up to automated testing / buildbot infrastructure. + +* Possibly... add the ability to download iBench from its original + server, pull out the JS test content, preprocess it, and add it as a + category to the benchmark. + +* Profit. |