page.title=Designing for Performance @jd:body

An Android application should be fast. Well, it's probably more accurate to say that it should be efficient. That is, it should execute as efficiently as possible in the mobile device environment, with its limited computing power and data storage, smaller screen, and constrained battery life.

As you develop your application, keep in mind that, while the application may perform well enough in your emulator, running on your dual-core development computer, it will not perform that well when run a mobile device — even the most powerful mobile device can't match the capabilities of a typical desktop system. For that reason, you should strive to write efficient code, to ensure the best possible performance on a variety of mobile devices.

Generally speaking, writing fast or efficient code means keeping memory allocations to a minimum, writing tight code, and avoiding certain language and programming idioms that can subtly cripple performance. In object-oriented terms, most of this work takes place at the method level, on the order of actual lines of code, loops, and so on.

This document covers these topics:

Introduction

There are two basic rules for resource-constrained systems:

All the tips below follow from these two basic tenets.

Some would argue that much of the advice on this page amounts to "premature optimization." While it's true that micro-optimizations sometimes make it harder to develop efficient data structures and algorithms, on embedded devices like handsets you often simply have no choice. For instance, if you bring your assumptions about VM performance on desktop machines to Android, you're quite likely to write code that exhausts system memory. This will bring your application to a crawl — let alone what it will do to other programs running on the system!

That's why these guidelines are important. Android's success depends on the user experience that your applications provide, and that user experience depends in part on whether your code is responsive and snappy, or slow and aggravating. Since all our applications will run on the same devices, we're all in this together, in a way. Think of this document as like the rules of the road you had to learn when you got your driver's license: things run smoothly when everybody follows them, but when you don't, you get your car smashed up.

Before we get down to brass tacks, a brief observation: nearly all issues described below are valid whether or not the VM features a JIT compiler. If I have two methods that accomplish the same thing, and the interpreted execution of foo() is faster than bar(), then the compiled version of foo() will probably be as fast or faster than compiled bar(). It is unwise to rely on a compiler to "save" you and make your code fast enough.

Optimize Judiciously

As you get started thinking about how to design your application, consider the cautionary points about optimization that Josh Bloch makes in his book Effective Java. Here's "Item 47: Optimize Judiciously", excerpted from the latest edition of the book with permission. Although Josh didn't have Android application development in mind when writing this section — for example, the java.awt.Component class referenced is not available in Android, and Android uses the Dalvik VM, rather than a standard JVM — his points are still valid.

There are three aphorisms concerning optimization that everyone should know. They are perhaps beginning to suffer from overexposure, but in case you aren't yet familiar with them, here they are:

More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason—including blind stupidity.

—William A. Wulf 1

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

—Donald E. Knuth 2

We follow two rules in the matter of optimization:

—M. A. Jackson 3

All of these aphorisms predate the Java programming language by two decades. They tell a deep truth about optimization: it is easy to do more harm than good, especially if you optimize prematurely. In the process, you may produce software that is neither fast nor correct and cannot easily be fixed.

Don't sacrifice sound architectural principles for performance. Strive to write good programs rather than fast ones. If a good program is not fast enough, its architecture will allow it to be optimized. Good programs embody the principle of information hiding: where possible, they localize design decisions within individual modules, so individual decisions can be changed without affecting the remainder of the system (Item 13).

This does not mean that you can ignore performance concerns until your program is complete. Implementation problems can be fixed by later optimization, but pervasive architectural flaws that limit performance can be impossible to fix without rewriting the system. Changing a fundamental facet of your design after the fact can result in an ill-structured system that is difficult to maintain and evolve. Therefore you must think about performance during the design process.

Strive to avoid design decisions that limit performance. The components of a design that are most difficult to change after the fact are those specifying interactions between modules and with the outside world. Chief among these design components are APIs, wire-level protocols, and persistent data formats. Not only are these design components difficult or impossible to change after the fact, but all of them can place significant limitations on the performance that a system can ever achieve.

Consider the performance consequences of your API design decisions. Making a public type mutable may require a lot of needless defensive copying (Item 39). Similarly, using inheritance in a public class where composition would have been appropriate ties the class forever to its superclass, which can place artificial limits on the performance of the subclass (Item 16). As a final example, using an implementation type rather than an interface in an API ties you to a specific implementation, even though faster implementations may be written in the future (Item 52).

The effects of API design on performance are very real. Consider the getSize method in the java.awt.Component class. The decision that this performance-critical method was to return a Dimension instance, coupled with the decision that Dimension instances are mutable, forces any implementation of this method to allocate a new Dimension instance on every invocation. Even though allocating small objects is inexpensive on a modern VM, allocating millions of objects needlessly can do real harm to performance.

In this case, several alternatives existed. Ideally, Dimension should have been immutable (Item 15); alternatively, the getSize method could have been replaced by two methods returning the individual primitive components of a Dimension object. In fact, two such methods were added to the Component API in the 1.2 release for performance reasons. Preexisting client code, however, still uses the getSize method and still suffers the performance consequences of the original API design decisions.

Luckily, it is generally the case that good API design is consistent with good performance. It is a very bad idea to warp an API to achieve good performance. The performance issue that caused you to warp the API may go away in a future release of the platform or other underlying software, but the warped API and the support headaches that come with it will be with you for life.

Once you've carefully designed your program and produced a clear, concise, and well-structured implementation, then it may be time to consider optimization, assuming you're not already satisfied with the performance of the program.

Recall that Jackson's two rules of optimization were "Don't do it," and "(for experts only). Don't do it yet." He could have added one more: measure performance before and after each attempted optimization. You may be surprised by what you find. Often, attempted optimizations have no measurable effect on performance; sometimes, they make it worse. The main reason is that it's difficult to guess where your program is spending its time. The part of the program that you think is slow may not be at fault, in which case you'd be wasting your time trying to optimize it. Common wisdom says that programs spend 80 percent of their time in 20 percent of their code.

Profiling tools can help you decide where to focus your optimization efforts. Such tools give you runtime information, such as roughly how much time each method is consuming and how many times it is invoked. In addition to focusing your tuning efforts, this can alert you to the need for algorithmic changes. If a quadratic (or worse) algorithm lurks inside your program, no amount of tuning will fix the problem. You must replace the algorithm with one that is more efficient. The more code in the system, the more important it is to use a profiler. It's like looking for a needle in a haystack: the bigger the haystack, the more useful it is to have a metal detector. The JDK comes with a simple profiler and modern IDEs provide more sophisticated profiling tools.

The need to measure the effects of attempted optimization is even greater on the Java platform than on more traditional platforms, because the Java programming language does not have a strong performance model. The relative costs of the various primitive operations are not well defined. The "semantic gap" between what the programmer writes and what the CPU executes is far greater than in traditional statically compiled languages, which makes it very difficult to reliably predict the performance consequences of any optimization. There are plenty of performance myths floating around that turn out to be half-truths or outright lies.

Not only is Java's performance model ill-defined, but it varies from JVM implementation to JVM implementation, from release to release, and from processor to processor. If you will be running your program on multiple JVM implementations or multiple hardware platforms, it is important that you measure the effects of your optimization on each. Occasionally you may be forced to make trade-offs between performance on different JVM implementations or hardware platforms.

To summarize, do not strive to write fast programs — strive to write good ones; speed will follow. Do think about performance issues while you're designing systems and especially while you're designing APIs, wire-level protocols, and persistent data formats. When you've finished building the system, measure its performance. If it's fast enough, you're done. If not, locate the source of the problems with the aid of a profiler, and go to work optimizing the relevant parts of the system. The first step is to examine your choice of algorithms: no amount of low-level optimization can make up for a poor choice of algorithm. Repeat this process as necessary, measuring the performance after every change, until you're satisfied.

—Excerpted from Josh Bloch's Effective Java, Second Ed. (Addison-Wesley, 2008).

1 Wulf, W. A Case Against the GOTO. Proceedings of the 25th ACM National Conference 2 (1972): 791–797.

2 Knuth, Donald. Structured Programming with go to Statements. Computing Surveys 6 (1974): 261–301.

3 Jackson, M. A. Principles of Program Design, Academic Press, London, 1975. ISBN: 0123790506.

Avoid Creating Objects

Object creation is never free. A generational GC with per-thread allocation pools for temporary objects can make allocation cheaper, but allocating memory is always more expensive than not allocating memory.

If you allocate objects in a user interface loop, you will force a periodic garbage collection, creating little "hiccups" in the user experience.

Thus, you should avoid creating object instances you don't need to. Some examples of things that can help:

A somewhat more radical idea is to slice up multidimensional arrays into parallel single one-dimension arrays:

Generally speaking, avoid creating short-term temporary objects if you can. Fewer objects created mean less-frequent garbage collection, which has a direct impact on user experience.

Use Native Methods

When processing strings, don't hesitate to use specialty methods like String.indexOf(), String.lastIndexOf(), and their cousins. These are typically implemented in C/C++ code that easily runs 10-100x faster than doing the same thing in a Java loop.

The flip side of that advice is that punching through to a native method is more expensive than calling an interpreted method. Don't use native methods for trivial computation, if you can avoid it.

Prefer Virtual Over Interface

Suppose you have a HashMap object. You can declare it as a HashMap or as a generic Map:

Map myMap1 = new HashMap();
HashMap myMap2 = new HashMap();

Which is better?

Conventional wisdom says that you should prefer Map, because it allows you to change the underlying implementation to anything that implements the Map interface. Conventional wisdom is correct for conventional programming, but isn't so great for embedded systems. Calling through an interface reference can take 2x longer than a virtual method call through a concrete reference.

If you have chosen a HashMap because it fits what you're doing, there is little value in calling it a Map. Given the availability of IDEs that refactor your code for you, there's not much value in calling it a Map even if you're not sure where the code is headed. (Again, though, public APIs are an exception: a good API usually trumps small performance concerns.)

Prefer Static Over Virtual

If you don't need to access an object's fields, make your method static. It can be called faster, because it doesn't require a virtual method table indirection. It's also good practice, because you can tell from the method signature that calling the method can't alter the object's state.

Avoid Internal Getters/Setters

In native languages like C++ it's common practice to use getters (e.g. i = getCount()) instead of accessing the field directly (i = mCount). This is an excellent habit for C++, because the compiler can usually inline the access, and if you need to restrict or debug field access you can add the code at any time.

On Android, this is a bad idea. Virtual method calls are expensive, much more so than instance field lookups. It's reasonable to follow common object-oriented programming practices and have getters and setters in the public interface, but within a class you should always access fields directly.

Cache Field Lookups

Accessing object fields is much slower than accessing local variables. Instead of writing:

for (int i = 0; i < this.mCount; i++)
      dumpItem(this.mItems[i]);

You should write:

  int count = this.mCount;
  Item[] items = this.mItems;
 
  for (int i = 0; i < count; i++)
      dumpItems(items[i]);

(We're using an explicit "this" to make it clear that these are member variables.)

A similar guideline is never call a method in the second clause of a "for" statement. For example, the following code will execute the getCount() method once per iteration, which is a huge waste when you could have simply cached the value as an int:

for (int i = 0; i < this.getCount(); i++)
    dumpItems(this.getItem(i));

It's also usually a good idea to create a local variable if you're going to be accessing an instance field more than once. For example:

    protected void drawHorizontalScrollBar(Canvas canvas, int width, int height) {
        if (isHorizontalScrollBarEnabled()) {
            int size = mScrollBar.getSize(false);
            if (size <= 0) {
                size = mScrollBarSize;
            }
            mScrollBar.setBounds(0, height - size, width, height);
            mScrollBar.setParams(
                    computeHorizontalScrollRange(),
                    computeHorizontalScrollOffset(),
                    computeHorizontalScrollExtent(), false);
            mScrollBar.draw(canvas);
        }
    }

That's four separate lookups of the member field mScrollBar. By caching mScrollBar in a local stack variable, the four member field lookups become four stack variable references, which are much more efficient.

Incidentally, method arguments have the same performance characteristics as local variables.

Declare Constants Final

Consider the following declaration at the top of a class:

static int intVal = 42;
static String strVal = "Hello, world!";

The compiler generates a class initializer method, called <clinit>, that is executed when the class is first used. The method stores the value 42 into intVal, and extracts a reference from the classfile string constant table for strVal. When these values are referenced later on, they are accessed with field lookups.

We can improve matters with the "final" keyword:

static final int intVal = 42;
static final String strVal = "Hello, world!";

The class no longer requires a <clinit> method, because the constants go into classfile static field initializers, which are handled directly by the VM. Code accessing intVal will use the integer value 42 directly, and accesses to strVal will use a relatively inexpensive "string constant" instruction instead of a field lookup.

Declaring a method or class "final" does not confer any immediate performance benefits, but it does allow certain optimizations. For example, if the compiler knows that a "getter" method can't be overridden by a sub-class, it can inline the method call.

You can also declare local variables final. However, this has no definitive performance benefits. For local variables, only use "final" if it makes the code clearer (or you have to, e.g. for use in an anonymous inner class).

Use Enhanced For Loop Syntax With Caution

The enhanced for loop (also sometimes known as "for-each" loop) can be used for collections that implement the Iterable interface. With these objects, an iterator is allocated to make interface calls to hasNext() and next(). With an ArrayList, you're better off walking through it directly, but for other collections the enhanced for loop syntax will be equivalent to explicit iterator usage.

Nevertheless, the following code shows an acceptable use of the enhanced for loop:

public class Foo {
    int mSplat;
    static Foo mArray[] = new Foo[27];

    public static void zero() {
        int sum = 0;
        for (int i = 0; i < mArray.length; i++) {
            sum += mArray[i].mSplat;
        }
    }

    public static void one() {
        int sum = 0;
        Foo[] localArray = mArray;
        int len = localArray.length;

        for (int i = 0; i < len; i++) {
            sum += localArray[i].mSplat;
        }
    }

    public static void two() {
        int sum = 0;
        for (Foo a: mArray) {
            sum += a.mSplat;
        }
    }
}

zero() retrieves the static field twice and gets the array length once for every iteration through the loop.

one() pulls everything out into local variables, avoiding the lookups.

two() uses the enhanced for loop syntax introduced in version 1.5 of the Java programming language. The code generated by the compiler takes care of copying the array reference and the array length to local variables, making it a good choice for walking through all elements of an array. It does generate an extra local load/store in the main loop (apparently preserving "a"), making it a teensy bit slower and 4 bytes longer than one().

To summarize all that a bit more clearly: enhanced for loop syntax performs well with arrays, but be cautious when using it with Iterable objects since there is additional object creation.

Avoid Enums

Enums are very convenient, but unfortunately can be painful when size and speed matter. For example, this:

public class Foo {
   public enum Shrubbery { GROUND, CRAWLING, HANGING }
}

turns into a 900 byte .class file (Foo$Shrubbery.class). On first use, the class initializer invokes the <init> method on objects representing each of the enumerated values. Each object gets its own static field, and the full set is stored in an array (a static field called "$VALUES"). That's a lot of code and data, just for three integers.

This:

Shrubbery shrub = Shrubbery.GROUND;

causes a static field lookup. If "GROUND" were a static final int, the compiler would treat it as a known constant and inline it.

The flip side, of course, is that with enums you get nicer APIs and some compile-time value checking. So, the usual trade-off applies: you should by all means use enums for public APIs, but try to avoid them when performance matters.

In some circumstances it can be helpful to get enum integer values through the ordinal() method. For example, replace:

for (int n = 0; n < list.size(); n++) {
    if (list.items[n].e == MyEnum.VAL_X)
       // do stuff 1
    else if (list.items[n].e == MyEnum.VAL_Y)
       // do stuff 2
}

with:

   int valX = MyEnum.VAL_X.ordinal();
   int valY = MyEnum.VAL_Y.ordinal();
   int count = list.size();
   MyItem items = list.items();

   for (int  n = 0; n < count; n++)
   {
        int  valItem = items[n].e.ordinal();

        if (valItem == valX)
          // do stuff 1
        else if (valItem == valY)
          // do stuff 2
   }

In some cases, this will be faster, though this is not guaranteed.

Use Package Scope with Inner Classes

Consider the following class definition:

public class Foo {
    private int mValue;

    public void run() {
        Inner in = new Inner();
        mValue = 27;
        in.stuff();
    }

    private void doStuff(int value) {
        System.out.println("Value is " + value);
    }

    private class Inner {
        void stuff() {
            Foo.this.doStuff(Foo.this.mValue);
        }
    }
}

The key things to note here are that we define an inner class (Foo$Inner) that directly accesses a private method and a private instance field in the outer class. This is legal, and the code prints "Value is 27" as expected.

The problem is that Foo$Inner is technically (behind the scenes) a totally separate class, which makes direct access to Foo's private members illegal. To bridge that gap, the compiler generates a couple of synthetic methods:

/*package*/ static int Foo.access$100(Foo foo) {
    return foo.mValue;
}
/*package*/ static void Foo.access$200(Foo foo, int value) {
    foo.doStuff(value);
}

The inner-class code calls these static methods whenever it needs to access the "mValue" field or invoke the "doStuff" method in the outer class. What this means is that the code above really boils down to a case where you're accessing member fields through accessor methods instead of directly. Earlier we talked about how accessors are slower than direct field accesses, so this is an example of a certain language idiom resulting in an "invisible" performance hit.

We can avoid this problem by declaring fields and methods accessed by inner classes to have package scope, rather than private scope. This runs faster and removes the overhead of the generated methods. (Unfortunately it also means the fields could be accessed directly by other classes in the same package, which runs counter to the standard OO practice of making all fields private. Once again, if you're designing a public API you might want to carefully consider using this optimization.)

Avoid Float

Before the release of the Pentium CPU, it was common for game authors to do as much as possible with integer math. With the Pentium, the floating point math co-processor became a built-in feature, and by interleaving integer and floating-point operations your game would actually go faster than it would with purely integer math. The common practice on desktop systems is to use floating point freely.

Unfortunately, embedded processors frequently do not have hardware floating point support, so all operations on "float" and "double" are performed in software. Some basic floating point operations can take on the order of a millisecond to complete.

Also, even for integers, some chips have hardware multiply but lack hardware divide. In such cases, integer division and modulus operations are performed in software — something to think about if you're designing a hash table or doing lots of math.

Some Sample Performance Numbers

To illustrate some of our ideas, here is a table listing the approximate run times for a few basic actions. Note that these values should NOT be taken as absolute numbers: they are a combination of CPU and wall clock time, and will change as improvements are made to the system. However, it is worth noting how these values apply relative to each other — for example, adding a member variable currently takes about four times as long as adding a local variable.

Action Time
Add a local variable 1
Add a member variable 4
Call String.length() 5
Call empty static native method 5
Call empty static method 12
Call empty virtual method 12.5
Call empty interface method 15
Call Iterator:next() on a HashMap 165
Call put() on a HashMap 600
Inflate 1 View from XML 22,000
Inflate 1 LinearLayout containing 1 TextView 25,000
Inflate 1 LinearLayout containing 6 View objects 100,000
Inflate 1 LinearLayout containing 6 TextView objects 135,000
Launch an empty activity 3,000,000

Closing Notes

The best way to write good, efficient code for embedded systems is to understand what the code you write really does. If you really want to allocate an iterator, by all means use enhanced for loop syntax on a List; just make it a deliberate choice, not an inadvertent side effect.

Forewarned is forearmed! Know what you're getting into! Insert your favorite maxim here, but always think carefully about what your code is doing, and be on the lookout for ways to speed it up.