diff options
author | Elliott Hughes <enh@google.com> | 2010-02-19 15:59:26 -0800 |
---|---|---|
committer | Elliott Hughes <enh@google.com> | 2010-02-19 17:51:15 -0800 |
commit | 13a6087f31a9b1d3e2011d63ce1fbd613a99f3bf (patch) | |
tree | 4a3d063d8dad471cde5085300cf86279220085d2 /docs/html/guide/practices/design | |
parent | 41207b6eb0524c6a2fe9e85f6373785e2937e90f (diff) | |
download | frameworks_base-13a6087f31a9b1d3e2011d63ce1fbd613a99f3bf.zip frameworks_base-13a6087f31a9b1d3e2011d63ce1fbd613a99f3bf.tar.gz frameworks_base-13a6087f31a9b1d3e2011d63ce1fbd613a99f3bf.tar.bz2 |
Update the "Android Performance" documentation.
A lot of this documentation isn't even true of the G1, let alone Froyo running
on a Nexus One. Distinguish between truth and fiction, clarify where the JIT
affects things, and clarify certain confusions (such as the difference between
intrinsics and native methods).
I still need to include updated performance numbers in the final section. I
should also make the benchmark code available so that people don't have to
take our word for these things, and so it's easier for them to get an idea of
the performance of future devices and builds. (Though hopefully we can update
this every release in future.)
Anyway, just removing the untruths is a big step forward.
Diffstat (limited to 'docs/html/guide/practices/design')
-rw-r--r-- | docs/html/guide/practices/design/performance.jd | 383 |
1 files changed, 142 insertions, 241 deletions
diff --git a/docs/html/guide/practices/design/performance.jd b/docs/html/guide/practices/design/performance.jd index ec34ac9..ab3b3d3 100644 --- a/docs/html/guide/practices/design/performance.jd +++ b/docs/html/guide/practices/design/performance.jd @@ -1,39 +1,37 @@ page.title=Designing for Performance @jd:body -<p>An Android application should be fast. Well, it's probably more accurate to -say that it should be <em>efficient</em>. That is, it should execute as -efficiently as possible in the mobile device environment, with its limited -computing power and data storage, smaller screen, and constrained battery life. - -<p>As you develop your application, keep in mind that, while the application may -perform well enough in your emulator, running on your dual-core development -computer, it will not perform that well when run a mobile device — even -the most powerful mobile device can't match the capabilities of a typical -desktop system. For that reason, you should strive to write efficient code, to -ensure the best possible performance on a variety of mobile devices.</p> - -<p>Generally speaking, writing fast or efficient code means keeping memory -allocations to a minimum, writing tight code, and avoiding certain language and -programming idioms that can subtly cripple performance. In object-oriented -terms, most of this work takes place at the <em>method</em> level, on the order of -actual lines of code, loops, and so on.</p> +<p>An Android application should be <em>efficient</em>. It will run on a mobile +device with limited computing power and storage, a smaller screen, and +constrained battery life. Battery life is one reason you might want to +optimize your app even if it already seems to run "fast enough". Battery life +is important to users, and Android's battery usage breakdown means users will +know if your app is responsible draining their battery.</p> + +<p>One of the trickiest problems you'll face when optimizing Android apps is +that it's not generally the case that you can say "device X is a factor F +faster/slower than device Y". +This is especially true if one of the devices is the emulator, or one of the +devices has a JIT. If you want to know how your app performs on a given device, +you need to test it on that device. Drawing conclusions from the emulator is +particularly dangerous, as is attempting to compare JIT versus non-JIT +performance: the performance <em>profiles</em> can differ wildly.</p> <p>This document covers these topics: </p> <ul> <li><a href="#intro">Introduction</a></li> <li><a href="#optimize_judiciously">Optimize Judiciously</a></li> <li><a href="#object_creation">Avoid Creating Objects</a></li> - <li><a href="#native_methods">Use Native Methods</a></li> - <li><a href="#prefer_virtual">Prefer Virtual Over Interface</a></li> + <li><a href="#myths">Performance Myths</a></li> <li><a href="#prefer_static">Prefer Static Over Virtual</a></li> <li><a href="#internal_get_set">Avoid Internal Getters/Setters</a></li> - <li><a href="#cache_fields">Cache Field Lookups</a></li> - <li><a href="#use_final">Declare Constants Final</a></li> - <li><a href="#foreach">Use Enhanced For Loop Syntax With Caution</a></li> - <li><a href="#avoid_enums">Avoid Enums</a></li> + <li><a href="#use_final">Use Static Final For Constants</a></li> + <li><a href="#foreach">Use Enhanced For Loop Syntax</a></li> + <li><a href="#avoid_enums">Avoid Enums Where You Only Need Ints</a></li> <li><a href="#package_inner">Use Package Scope with Inner Classes</a></li> - <li><a href="#avoidfloat">Avoid Float</a> </li> + <li><a href="#avoidfloat">Use Floating-Point Judiciously</a> </li> + <li><a href="#library">Know And Use The Libraries</a></li> + <li><a href="#native_methods">Use Native Methods Judiciously</a></li> <li><a href="#samples">Some Sample Performance Numbers</a> </li> <li><a href="#closing_notes">Closing Notes</a></li> </ul> @@ -41,43 +39,17 @@ actual lines of code, loops, and so on.</p> <a name="intro" id="intro"></a> <h2>Introduction</h2> -<p>There are two basic rules for resource-constrained systems:</p> +<p>There are two basic rules for writing efficient code:</p> <ul> <li>Don't do work that you don't need to do.</li> <li>Don't allocate memory if you can avoid it.</li> </ul> -<p>All the tips below follow from these two basic tenets.</p> - -<p>Some would argue that much of the advice on this page amounts to "premature -optimization." While it's true that micro-optimizations sometimes make it -harder to develop efficient data structures and algorithms, on embedded -devices like handsets you often simply have no choice. For instance, if you -bring your assumptions about VM performance on desktop machines to Android, -you're quite likely to write code that exhausts system memory. This will bring -your application to a crawl — let alone what it will do to other programs -running on the system!</p> - -<p>That's why these guidelines are important. Android's success depends on -the user experience that your applications provide, and that user experience -depends in part on whether your code is responsive and snappy, or slow and -aggravating. Since all our applications will run on the same devices, we're -all in this together, in a way. Think of this document as like the rules of -the road you had to learn when you got your driver's license: things run -smoothly when everybody follows them, but when you don't, you get your car -smashed up.</p> - -<p>Before we get down to brass tacks, a brief observation: nearly all issues -described below are valid whether or not the VM features a JIT compiler. If I -have two methods that accomplish the same thing, and the interpreted execution -of foo() is faster than bar(), then the compiled version of foo() will -probably be as fast or faster than compiled bar(). It is unwise to rely on a -compiler to "save" you and make your code fast enough.</p> - <h2 id="optimize_judiciously">Optimize Judiciously</h2> -<p>As you get started thinking about how to design your application, consider +<p>As you get started thinking about how to design your application, and as +you write it, consider the cautionary points about optimization that Josh Bloch makes in his book <em>Effective Java</em>. Here's "Item 47: Optimize Judiciously", excerpted from the latest edition of the book with permission. Although Josh didn't have @@ -273,8 +245,8 @@ examples of things that can help:</p> instead of creating a short-lived temporary object.</li> </ul> -<p>A somewhat more radical idea is to slice up multidimensional arrays into parallel -single one-dimension arrays:</p> +<p>A somewhat more radical idea is to slice up multidimensional arrays into +parallel single one-dimension arrays:</p> <ul> <li>An array of ints is a much better than an array of Integers, @@ -294,49 +266,32 @@ single one-dimension arrays:</p> can. Fewer objects created mean less-frequent garbage collection, which has a direct impact on user experience.</p> -<a name="native_methods" id="native_methods"></a> -<h2>Use Native Methods</h2> - -<p>When processing strings, don't hesitate to use specialty methods like -String.indexOf(), String.lastIndexOf(), and their cousins. These are typically -implemented in C/C++ code that easily runs 10-100x faster than doing the same -thing in a Java loop.</p> - -<p>The flip side of that advice is that punching through to a native -method is more expensive than calling an interpreted method. Don't use native -methods for trivial computation, if you can avoid it.</p> - -<a name="prefer_virtual" id="prefer_virtual"></a> -<h2>Prefer Virtual Over Interface</h2> +<a name="myths" id="myths"></a> +<h2>Performance Myths</h2> -<p>Suppose you have a HashMap object. You can declare it as a HashMap or as -a generic Map:</p> +<p>Previous versions of this document made various misleading claims. We +address some of them here.</p> -<pre>Map myMap1 = new HashMap(); -HashMap myMap2 = new HashMap();</pre> +<p>On devices without a JIT, it is true that invoking methods via a +variable with an exact type rather than an interface is slightly more +efficient. (So, for example, it was cheaper to invoke methods on a +<code>Map map</code> than a <code>HashMap map</code>, even though in both +cases the map was a <code>HashMap</code>.) It was not the case that this +was 2x slower; the actual difference was more like 6% slower. Furthermore, +the JIT makes the two effectively indistinguishable.</p> -<p>Which is better?</p> - -<p>Conventional wisdom says that you should prefer Map, because it -allows you to change the underlying implementation to anything that -implements the Map interface. Conventional wisdom is correct for -conventional programming, but isn't so great for embedded systems. Calling -through an interface reference can take 2x longer than a virtual -method call through a concrete reference.</p> - -<p>If you have chosen a HashMap because it fits what you're doing, there -is little value in calling it a Map. Given the availability of -IDEs that refactor your code for you, there's not much value in calling -it a Map even if you're not sure where the code is headed. (Again, though, -public APIs are an exception: a good API usually trumps small performance -concerns.)</p> +<p>On devices without a JIT, caching field accesses is about 20% faster than +repeatedly accesssing the field. With a JIT, field access costs about the same +as local access, so this isn't a worthwhile optimization unless you feel it +makes your code easier to read. (This is true of final, static, and static +final fields too.) <a name="prefer_static" id="prefer_static"></a> <h2>Prefer Static Over Virtual</h2> -<p>If you don't need to access an object's fields, make your method static. It can -be called faster, because it doesn't require a virtual method table -indirection. It's also good practice, because you can tell from the method +<p>If you don't need to access an object's fields, make your method static. +Invocations will be about 15%-20% faster. +It's also good practice, because you can tell from the method signature that calling the method can't alter the object's state.</p> <a name="internal_get_set" id="internal_get_set"></a> @@ -354,62 +309,14 @@ common object-oriented programming practices and have getters and setters in the public interface, but within a class you should always access fields directly.</p> -<a name="cache_fields" id="cache_fields"></a> -<h2>Cache Field Lookups</h2> - -<p>Accessing object fields is much slower than accessing local variables. -Instead of writing:</p> -<pre>for (int i = 0; i < this.mCount; i++) - dumpItem(this.mItems[i]);</pre> - -<p>You should write:</p> -<pre> int count = this.mCount; - Item[] items = this.mItems; - - for (int i = 0; i < count; i++) - dumpItems(items[i]); -</pre> - -<p>(We're using an explicit "this" to make it clear that these are -member variables.)</p> - -<p>A similar guideline is never call a method in the second clause of a "for" -statement. For example, the following code will execute the getCount() method -once per iteration, which is a huge waste when you could have simply cached -the value as an int:</p> - -<pre>for (int i = 0; i < this.getCount(); i++) - dumpItems(this.getItem(i)); -</pre> - -<p>It's also usually a good idea to create a local variable if you're going to be -accessing an instance field more than once. For example:</p> - -<pre> - protected void drawHorizontalScrollBar(Canvas canvas, int width, int height) { - if (isHorizontalScrollBarEnabled()) { - int size = <strong>mScrollBar</strong>.getSize(<em>false</em>); - if (size <= 0) { - size = mScrollBarSize; - } - <strong>mScrollBar</strong>.setBounds(0, <em>height</em> - size, width, height); - <strong>mScrollBar</strong>.setParams( - computeHorizontalScrollRange(), - computeHorizontalScrollOffset(), - computeHorizontalScrollExtent(), <em>false</em>); - <strong>mScrollBar</strong>.draw(canvas); - } - }</pre> - -<p>That's four separate lookups of the member field <code>mScrollBar</code>. -By caching mScrollBar in a local stack variable, the four member field lookups -become four stack variable references, which are much more efficient.</p> - -<p>Incidentally, method arguments have the same performance characteristics -as local variables.</p> +<p>Without a JIT, direct field access is about 3x faster than invoking a +trivial getter. With the JIT (where direct field access is as cheap as +accessing a local), direct field access is about 7x faster than invoking a +trivial getter. This is true in Froyo, but will improve in the future when +the JIT inlines getter methods.</p> <a name="use_final" id="use_final"></a> -<h2>Declare Constants Final</h2> +<h2>Use Static Final For Constants</h2> <p>Consider the following declaration at the top of a class:</p> @@ -429,39 +336,40 @@ lookups.</p> static final String strVal = "Hello, world!";</pre> <p>The class no longer requires a <code><clinit></code> method, -because the constants go into classfile static field initializers, which are -handled directly by the VM. Code accessing <code>intVal</code> will use +because the constants go into static field initializers in the dex file. +Code that refers to <code>intVal</code> will use the integer value 42 directly, and accesses to <code>strVal</code> will use a relatively inexpensive "string constant" instruction instead of a -field lookup.</p> - -<p>Declaring a method or class "final" does not confer any immediate -performance benefits, but it does allow certain optimizations. For example, if -the compiler knows that a "getter" method can't be overridden by a sub-class, -it can inline the method call.</p> - -<p>You can also declare local variables final. However, this has no definitive -performance benefits. For local variables, only use "final" if it makes the -code clearer (or you have to, e.g. for use in an anonymous inner class).</p> +field lookup. (Note that this optimization only applies to primitive types and +<code>String</code> constants, not arbitrary reference types. Still, it's good +practice to declare constants <code>static final</code> whenever possible.)</p> <a name="foreach" id="foreach"></a> -<h2>Use Enhanced For Loop Syntax With Caution</h2> +<h2>Use Enhanced For Loop Syntax</h2> -<p>The enhanced for loop (also sometimes known as "for-each" loop) can be used for collections that implement the Iterable interface. -With these objects, an iterator is allocated to make interface calls -to hasNext() and next(). With an ArrayList, you're better off walking through -it directly, but for other collections the enhanced for loop syntax will be equivalent -to explicit iterator usage.</p> +<p>The enhanced for loop (also sometimes known as "for-each" loop) can be used +for collections that implement the Iterable interface and for arrays. +With collections, an iterator is allocated to make interface calls +to hasNext() and next(). With an ArrayList, a hand-written counted loop is +about 3x faster (with or without JIT), but for other collections the enhanced +for loop syntax will be exactly equivalent to explicit iterator usage.</p> -<p>Nevertheless, the following code shows an acceptable use of the enhanced for loop:</p> +<p>There are several alternatives for iterating through an array:</p> <pre>public class Foo { int mSplat; - static Foo mArray[] = new Foo[27]; +} +public class ArrayBenchmark { + Foo[] mArray = new Foo[27]; + { + for (int i = 0; i < mArray.length; ++i) { + mArray[i] = new Foo(); + } + } public static void zero() { int sum = 0; - for (int i = 0; i < mArray.length; i++) { + for (int i = 0; i < mArray.length; ++i) { sum += mArray[i].mSplat; } } @@ -471,53 +379,51 @@ to explicit iterator usage.</p> Foo[] localArray = mArray; int len = localArray.length; - for (int i = 0; i < len; i++) { + for (int i = 0; i < len; ++i) { sum += localArray[i].mSplat; } } public static void two() { int sum = 0; - for (Foo a: mArray) { + for (Foo a : mArray) { sum += a.mSplat; } } }</pre> -<p><strong>zero()</strong> retrieves the static field twice and gets the array -length once for every iteration through the loop.</p> +<p><strong>zero()</strong> is slowest, because the JIT can't yet optimize away +the cost of getting the array length once for every iteration through the +loop.</p> -<p><strong>one()</strong> pulls everything out into local variables, avoiding -the lookups.</p> +<p><strong>one()</strong> is faster. It pulls everything out into local +variables, avoiding the lookups. Only the array length offers a performance +benefit.</p> -<p><strong>two()</strong> uses the enhanced for loop syntax introduced in version 1.5 of -the Java programming language. The code generated by the compiler takes care -of copying the array reference and the array length to local variables, making -it a good choice for walking through all elements of an array. It does -generate an extra local load/store in the main loop (apparently preserving -"a"), making it a teensy bit slower and 4 bytes longer than one().</p> +<p><strong>two()</strong> is fastest for devices without a JIT, and +indistinguishable from <strong>one()</strong> for devices with a JIT. +It uses the enhanced for loop syntax introduced in version 1.5 of the Java +programming language.</p> -<p>To summarize all that a bit more clearly: enhanced for loop syntax performs well -with arrays, but be cautious when using it with Iterable objects since there is -additional object creation.</p> +<p>To summarize: use the enhanced for loop by default, but consider a +hand-written counted loop for performance-critical ArrayList iteration.</p> + +<p>(See also <em>Effective Java</em> item 46.)</p> <a name="avoid_enums" id="avoid_enums"></a> -<h2>Avoid Enums</h2> +<h2>Avoid Enums Where You Only Need Ints</h2> <p>Enums are very convenient, but unfortunately can be painful when size and speed matter. For example, this:</p> -<pre>public class Foo { - public enum Shrubbery { GROUND, CRAWLING, HANGING } -}</pre> +<pre>public enum Shrubbery { GROUND, CRAWLING, HANGING }</pre> -<p>turns into a 900 byte .class file (Foo$Shrubbery.class). On first use, the +<p>adds 740 bytes to your .dex file compared to the equivalent class +with three public static final ints. On first use, the class initializer invokes the <init> method on objects representing each of the enumerated values. Each object gets its own static field, and the full set is stored in an array (a static field called "$VALUES"). That's a lot of -code and data, just for three integers.</p> - -<p>This:</p> +code and data, just for three integers. Additionally, this:</p> <pre>Shrubbery shrub = Shrubbery.GROUND;</pre> @@ -529,34 +435,11 @@ some compile-time value checking. So, the usual trade-off applies: you should by all means use enums for public APIs, but try to avoid them when performance matters.</p> -<p>In some circumstances it can be helpful to get enum integer values -through the <code>ordinal()</code> method. For example, replace:</p> - -<pre>for (int n = 0; n < list.size(); n++) { - if (list.items[n].e == MyEnum.VAL_X) - // do stuff 1 - else if (list.items[n].e == MyEnum.VAL_Y) - // do stuff 2 -}</pre> - -<p>with:</p> - -<pre> int valX = MyEnum.VAL_X.ordinal(); - int valY = MyEnum.VAL_Y.ordinal(); - int count = list.size(); - MyItem items = list.items(); - - for (int n = 0; n < count; n++) - { - int valItem = items[n].e.ordinal(); - - if (valItem == valX) - // do stuff 1 - else if (valItem == valY) - // do stuff 2 - }</pre> - -<p>In some cases, this will be faster, though this is not guaranteed.</p> +<p>If you're using <code>Enum.ordinal</code>, that's usually a sign that you +should be using ints instead. As a rule of thumb, if an enum doesn't have a +constructor and doesn't define its own methods, and it's used in +performance-critical code, you should consider <code>static final int</code> +constants instead.</p> <a name="package_inner" id="package_inner"></a> <h2>Use Package Scope with Inner Classes</h2> @@ -588,10 +471,11 @@ that directly accesses a private method and a private instance field in the outer class. This is legal, and the code prints "Value is 27" as expected.</p> -<p>The problem is that Foo$Inner is technically (behind the scenes) a totally -separate class, which makes direct access to Foo's private -members illegal. To bridge that gap, the compiler generates a -couple of synthetic methods:</p> +<p>The problem is that the VM considers direct access to Foo's private members +from Foo$Inner to be illegal because Foo and Foo$Inner are different classes, +even though the Java language allows an inner class to access an outer class' +private members. To bridge the gap, the compiler generates a couple of +synthetic methods:</p> <pre>/*package*/ static int Foo.access$100(Foo foo) { return foo.mValue; @@ -612,31 +496,53 @@ accesses, so this is an example of a certain language idiom resulting in an by inner classes to have package scope, rather than private scope. This runs faster and removes the overhead of the generated methods. (Unfortunately it also means the fields could be accessed directly by other -classes in the same package, which runs counter to the standard OO +classes in the same package, which runs counter to the standard practice of making all fields private. Once again, if you're designing a public API you might want to carefully consider using this optimization.)</p> <a name="avoidfloat" id="avoidfloat"></a> -<h2>Avoid Float</h2> +<h2>Use Floating-Point Judiciously</h2> -<p>Before the release of the Pentium CPU, it was common for game authors to do -as much as possible with integer math. With the Pentium, the floating point -math co-processor became a built-in feature, and by interleaving integer and -floating-point operations your game would actually go faster than it would -with purely integer math. The common practice on desktop systems is to use -floating point freely.</p> +<p>As a rule of thumb, floating-point is about 2x slower than integer on +Android devices. This is true on a FPU-less, JIT-less G1 and a Nexus One with +an FPU and the JIT. (Of course, absolute speed difference between those two +devices is about 10x for arithmetic operations.)</p> -<p>Unfortunately, embedded processors frequently do not have hardware floating -point support, so all operations on "float" and "double" are performed in -software. Some basic floating point operations can take on the order of a -millisecond to complete.</p> +<p>In speed terms, there's no difference between <code>float</code> and +<code>double</code> on the more modern hardware. Space-wise, <code>double</code> +is 2x larger. As with desktop machines, assuming space isn't an issue, you +should prefer <code>double</code> to <code>float</code>.</p> <p>Also, even for integers, some chips have hardware multiply but lack hardware divide. In such cases, integer division and modulus operations are performed in software — something to think about if you're designing a hash table or doing lots of math.</p> +<a name="library" id="library"></a> +<h2>Know And Use The Libraries</h2> + +<p>In addition to all the usual reasons to prefer library code over rolling +your own, bear in mind that the system is at liberty to replace calls +to library methods with hand-coded assembler, which may be better than the +best code the JIT can produce for the equivalent Java. The typical example +here is <code>String.indexOf</code> and friends, which Dalvik replaces with +an inlined intrinsic. Similarly, the <code>System.arraycopy</code> method +is about 9x faster than a hand-coded loop on a Nexus One with the JIT.</p> + +<p>(See also <em>Effective Java</em> item 47.)</p> + +<a name="native_methods" id="native_methods"></a> +<h2>Use Native Methods Judiciously</h2> + +<p>Native code isn't necessarily more efficient than Java. There's a cost +associated with the Java-native transition, it can be significantly more +difficult to arrange timely collection of your native resources, and you +need to compile your code for each architecture you wish to run on (rather +than rely on it having a JIT).</p> + +<p>(See also <em>Effective Java</em> item 54.)</p> + <a name="samples" id="samples"></a> <h2>Some Sample Performance Numbers</h2> @@ -714,11 +620,6 @@ local variable.</p> <a name="closing_notes" id="closing_notes"></a> <h2>Closing Notes</h2> -<p>The best way to write good, efficient code for embedded systems is to -understand what the code you write really does. If you really want to allocate -an iterator, by all means use enhanced for loop syntax on a List; just make it a -deliberate choice, not an inadvertent side effect.</p> - -<p>Forewarned is forearmed! Know what you're getting into! Insert your -favorite maxim here, but always think carefully about what your code is doing, -and be on the lookout for ways to speed it up.</p> +<p>One last thing: always measure. Before you start optimizing, make sure you +have a problem. Make sure you can accurately measure your existing performance, +or you won't be able to measure the benefit of the alternatives you try.</p> |