summaryrefslogtreecommitdiffstats
path: root/PerformanceTests/SunSpider/TODO
diff options
context:
space:
mode:
Diffstat (limited to 'PerformanceTests/SunSpider/TODO')
-rw-r--r--PerformanceTests/SunSpider/TODO70
1 files changed, 70 insertions, 0 deletions
diff --git a/PerformanceTests/SunSpider/TODO b/PerformanceTests/SunSpider/TODO
new file mode 100644
index 0000000..fee4bc0
--- /dev/null
+++ b/PerformanceTests/SunSpider/TODO
@@ -0,0 +1,70 @@
+
+* Add more test cases. Categories we'd like to cover (with reasonably
+ real-world tests, preferably not microbenchmarks) include:
+
+ (X marks the ones that are fairly well covered now).
+
+ X math (general)
+ X bitops
+ X 3-d (the math bits)
+ - crypto / encoding
+ X string processing
+ - regexps
+ - date processing
+ - array processing
+ - control flow
+ - function calls / recursion
+ - object access (unclear if it is possible to make a realistic
+ benchmark that isolates this)
+
+ I'd specifically like to add all the computer language shootout
+ tests that Mozilla is using.
+
+* Normalize tests. Most of the test cases available have a repeat
+ count of some sort, so the time they take can be tuned. The tests
+ should be tuned so that each category contributes about the same
+ total, and so each test in each category contributes about the same
+ amount. The question is, what implementation should be the baseline?
+ My current thought is to either pick some specific browser on a
+ specific platform (IE 7 or Firefox 2 perhaps), or try to target the
+ average that some set of same-generation release browsers get on
+ each test. The latter is more work. IE7 is probably a reasonable
+ normalization target since it is the latest version of the most
+ popular browser, so results on this benchmark will tell you how much
+ you have to gain or lose by using a different browser.
+
+* Instead of using the standard error, the correct way to calculate
+ a 95% confidence interval for a small sample is the t-test.
+ <http://en.wikipedia.org/wiki/Student%27s_t-test>. Basically this involves
+ using values from a 2-tailed t-distribution table instead of 1.96 to
+ multiply by the error function, a table is available at
+ <http://www.medcalc.be/manual/t-distribution.php>
+
+* Add support to compare two different engines (or two builds of the
+ same engine) interleaved.
+
+* Add support to compare two existing sets of saved results.
+
+* Allow repeat count to be controlled from the browser-hosted version
+ and the WebKitTools wrapper script.
+
+* Add support to run only a subset of the tests (both command-line and
+ web versions).
+
+* Add a profile mode for the command-line version that runs the tests
+ repeatedly in the same command-line interpreter instance, for ease
+ of profiling.
+
+* Make the browser-hosted version prettier, both in general design and
+ maybe using bar graphs for the output.
+
+* Make it possible to track change over time and generate a graph per
+ result showing result and error bar for each version.
+
+* Hook up to automated testing / buildbot infrastructure.
+
+* Possibly... add the ability to download iBench from its original
+ server, pull out the JS test content, preprocess it, and add it as a
+ category to the benchmark.
+
+* Profit.