diff options
-rw-r--r-- | docs/html/preview/images/perf-test-frame-latency.png | bin | 0 -> 161802 bytes | |||
-rw-r--r-- | docs/html/preview/images/perf-test-framestats.png | bin | 0 -> 7785 bytes | |||
-rw-r--r-- | docs/html/preview/testing/performance.jd | 667 |
3 files changed, 667 insertions, 0 deletions
diff --git a/docs/html/preview/images/perf-test-frame-latency.png b/docs/html/preview/images/perf-test-frame-latency.png Binary files differnew file mode 100644 index 0000000..87d1cfc --- /dev/null +++ b/docs/html/preview/images/perf-test-frame-latency.png diff --git a/docs/html/preview/images/perf-test-framestats.png b/docs/html/preview/images/perf-test-framestats.png Binary files differnew file mode 100644 index 0000000..589a923 --- /dev/null +++ b/docs/html/preview/images/perf-test-framestats.png diff --git a/docs/html/preview/testing/performance.jd b/docs/html/preview/testing/performance.jd new file mode 100644 index 0000000..a61091f --- /dev/null +++ b/docs/html/preview/testing/performance.jd @@ -0,0 +1,667 @@ +page.title=Testing Display Performance + +@jd:body + + +<div id="qv-wrapper"> + <div id="qv"> + <h2>In this document</h2> + <ol> + <li><a href="#measure">Measuring UI Performance</a> + <ul> + <li><a href="#aggregate">Aggregate frame stats</a></li> + <li><a href="#timing-info">Precise frame timing info</a></li> + <li><a href="#timing-dump">Simple frame timing dump</a></li> + <li><a href="#collection-window">Controlling the window of stat collection</a></li> + <li><a href="#diagnose">Diagnosing performance regressions</a></li> + <li><a href="#resources">Additional resources</a></li> + </ul> + </li> + <li><a href="#automate">Automating UI Perfomance Tests</a> + <ul> + <li><a href="#ui-tests">Setting up UI tests</a></li> + <li><a href="#automated-tests">Setting up automated UI testing</a></li> + <li><a href="#triage">Triaging and fixing observed problems</a></li> + </ul> + </li> + </ol> + </div> +</div> + + +<p> + User interface (UI) performance testing ensures that your app not only meets its functional + requirements, but that user interactions with your app are buttery smooth, running at a + consistent 60 frames per second (<a href= + "https://www.youtube.com/watch?v=CaMTIgxCSqU&index=25&list=PLWz5rJ2EKKc9CBxr3BVjPTPoDPLdPIFCE">why + 60fps?</a>), without any dropped or delayed frames, or as we like to call it, <em>jank</em>. This + document explains tools available to measure UI performance, and lays out an approach to + integrate UI performance measurements into your testing practices. +</p> + + +<h2 id="measure">Measuring UI Performance</h2> + +<p> + In order to improve performance you first need the ability to measure the performance of + your system, and then diagnose and identify problems that may arrive from various parts of your + pipeline. +</p> + +<p> + <em><a href="https://source.android.com/devices/tech/debug/dumpsys.html">dumpsys</a></em> is an + Android tool that runs on the device and dumps interesting information about the status of system + services. Passing the <em>gfxinfo</em> command to dumpsys provides an output in logcat with + performance information relating to frames of animation that are occurring during the recording + phase. +</p> + +<pre> +> adb shell dumpsys gfxinfo <PACKAGE_NAME> +</pre> + +<p> + This command can produce multiple different variants of frame timing data. +</p> + +<h3 id="aggregate">Aggregate frame stats</h3> + +<p> + With the M Preview the command prints out aggregated analysis of frame data to logcat, collected + across the entire lifetime of the process. For example: +</p> + +<pre class="noprettyprint"> +Stats since: 752958278148ns +Total frames rendered: 82189 +Janky frames: 35335 (42.99%) +90th percentile: 34ms +95th percentile: 42ms +99th percentile: 69ms +Number Missed Vsync: 4706 +Number High input latency: 142 +Number Slow UI thread: 17270 +Number Slow bitmap uploads: 1542 +Number Slow draw: 23342 +</pre> + +<p> + These high level statistics convey at a high level the rendering performance of the app, as well + as its stability across many frames. +</p> + + +<h3 id="timing-info">Precise frame timing info</h3> + +<p> + With the M Preview comes a new command for gfxinfo, and that’s <em>framestats</em> which provides + extremely detailed frame timing information from recent frames, so that you can track down and + debug problems more accurately. +</p> + +<pre> +>adb shell dumpsys gfxinfo <PACKAGE_NAME> framestats +</pre> + +<p> + This command prints out frame timing information, with nanosecond timestamps, from the last 120 + frames produced by the app. Below is example raw output from adb dumpsys gfxinfo + <PACKAGE_NAME> framestats: +</p> + +<pre class="noprettyprint"> +0,49762224585003,49762241251670,9223372036854775807,0,49762257627204,49762257646058,49762257969704,49762258002100,49762265541631,49762273951162,49762300914808,49762303675954, +0,49762445152142,49762445152142,9223372036854775807,0,49762446678818,49762446705589,49762447268818,49762447388037,49762453551527,49762457134131,49762474889027,49762476150120, +0,49762462118845,49762462118845,9223372036854775807,0,49762462595381,49762462619287,49762462919964,49762462968454,49762476194547,49762476483454,49762480214964,49762480911527, +0,49762479085548,49762479085548,9223372036854775807,0,49762480066370,49762480099339,49762481013089,49762481085850,49762482232152,49762482478350,49762485657620,49762486116683, +</pre> + +<p> + Each line of this output represents a frame produced by the app. Each line has a fixed number of + columns describing time spent in each stage of the frame-producing pipeline. The next section + describes this format in detail, including what each column represents. +</p> + + +<h4 id="fs-data-format">Framestats data format</h4> + +<p> + Since the block of data is output in CSV format, it's very straightforward to paste it to your + spreadsheet tool of choice, or collect and parse with a script. The following table explains the + format of the output data columns. All timestamps are in nanoseconds. +</p> + +<ul> + <li>FLAGS + <ul> + <li>Rows with a ‘0’ for the FLAGS column can have their total frame time computed by + subtracting the INTENDED_VSYNC column from the FRAME_COMPLETED column. + </li> + + <li>If this is non-zero the row should be ignored, as the frame has been determined as being + an outlier from normal performance, where it is expected that layout & draw take longer + than 16ms. Here are a few reasons this could occur: + <ul> + <li>The window layout changed (such as the first frame of the application or after a + rotation) + </li> + + <li>It is also possible the frame was skipped in which case some of the values will have + garbage timestamps. A frame can be skipped if for example it is out-running 60fps or if + nothing on-screen ended up being dirty, this is not necessarily a sign of a problem in + the app. + </li> + </ul> + </li> + </ul> + </li> + + <li>VSYNC + <ul> + <li>The time value that was used in all the vsync listeners and drawing for the frame + (Choreographer frame callbacks, animations, View.getDrawingTime(), etc…) + </li> + + <li>To understand more about VSYNC and how it influences your application, check out the + <a href= + "https://www.youtube.com/watch?v=1iaHxmfZGGc&list=PLOU2XLYxmsIKEOXh5TwZEv89aofHzNCiu&index=23"> + Understanding VSYNC</a> video. + </li> + </ul> + </li> + + + <li>INTENDED_VSYNC + <ul> + <li>The intended start point for the frame. If this value is different from VSYNC, there + was work occurring on the UI thread that prevented it from responding to the vsync signal + in a timely fashion. + </li> + </ul> + </li> + + <li>OLDEST_INPUT_EVENT + <ul> + <li>The timestamp of the oldest input event in the input queue, or Long.MAX_VALUE if + there were no input events for the frame. + </li> + + <li>This value is primarily intended for platform work and has limited usefulness to app + developers. + </li> + </ul> + </li> + + <li>NEWEST_INPUT_EVENT + <ul> + <li>The timestamp of the newest input event in the input queue, or 0 if there were no + input events for the frame. + </li> + + <li>This value is primarily intended for platform work and has limited usefulness to app + developers. + </li> + + <li>However it’s possible to get a rough idea of how much latency the app is adding by + looking at (FRAME_COMPLETED - NEWEST_INPUT_EVENT). + </li> + </ul> + </li> + + <li>HANDLE_INPUT_START + <ul> + <li>The timestamp at which input events were dispatched to the application. + </li> + + <li>By looking at the time between this and ANIMATION_START it is possible to measure how + long the application spent handling input events. + </li> + + <li>If this number is high (>2ms), this indicates the app is spending an unusually + long time processing input events, such as View.onTouchEvent(), which may indicate this + work needs to be optimized, or offloaded to a different thread. Note that there are some + scenarios, such as click events that launch new activities or similar, where it is + expected and acceptable that this number is large. + </li> + </ul> + </li> + + <li>ANIMATION_START + <ul> + <li>The timestamp at which animations registered with Choreographer were run. + </li> + + <li>By looking at the time between this and PERFORM_TRANVERSALS_START it is possible to + determine how long it took to evaluate all the animators (ObjectAnimator, + ViewPropertyAnimator, and Transitions being the common ones) that are running. + </li> + + <li>If this number is high (>2ms), check to see if your app has written any custom + animators or what fields ObjectAnimators are animating and ensure they are appropriate + for an animation. + </li> + + <li>To learn more about Choreographer, check out the <a href= + "https://developers.google.com/events/io/sessions/325418001">For Butter or Worse</a> + video. + </li> + </ul> + </li> + + <li>PERFORM_TRAVERSALS_START + <ul> + <li>If you subtract out DRAW_START from this value, you can extract how long the layout + & measure phases took to complete. (note, during a scroll, or animation, you would + hope this should be close to zero..) + </li> + + <li>To learn more about the measure & layout phases of the rendering pipeline, check + out the <a href= + "https://www.youtube.com/watch?v=we6poP0kw6E&list=PLOU2XLYxmsIKEOXh5TwZEv89aofHzNCiu&index=27"> + Invalidations, Layouts and Performance</a> video + </li> + </ul> + </li> + + <li>DRAW_START + <ul> + <li>The time at which the draw phase of performTraversals started. This is the start + point of recording the display lists of any views that were invalidated. + </li> + + <li>The time between this and SYNC_START is how long it took to call View.draw() on all + the invalidated views in the tree. + </li> + + <li>For more information on the drawing model, see <a href= + "{@docRoot}guide/topics/graphics/hardware-accel.html#hardware-model">Hardware Acceleration</a> + or the <a href= + "https://www.youtube.com/watch?v=we6poP0kw6E&list=PLOU2XLYxmsIKEOXh5TwZEv89aofHzNCiu&index=27"> + Invalidations, Layouts and Performance</a> video + </li> + </ul> + </li> + + <li>SYNC_START + <ul> + <li>The time at which the sync phase of the drawing started. + </li> + + <li>If the time between this and ISSUE_DRAW_COMMANDS_START is substantial (>0.4ms or + so), it typically indicates a lot of new Bitmaps were drawn which must be uploaded to the + GPU. + </li> + + <li>To understand more about the sync phase, check out the <a href= + "https://www.youtube.com/watch?v=VzYkVL1n4M8&index=24&list=PLOU2XLYxmsIKEOXh5TwZEv89aofHzNCiu"> + Profile GPU Rendering</a> video + </li> + </ul> + </li> + + <li>ISSUE_DRAW_COMMANDS_START + <ul> + <li>The time at which the hardware renderer started issuing drawing commands to the GPU. + </li> + + <li>The time between this and FRAME_COMPLETED gives a rough idea of how much GPU work the + app is producing. Problems like too much overdraw or inefficient rendering effects show + up here. + </li> + </ul> + </li> + + <li>SWAP_BUFFERS + <ul> + <li>The time at which eglSwapBuffers was called, relatively uninteresting outside of + platform work. + </li> + </ul> + </li> + + <li>FRAME_COMPLETED + <ul> + <li>All done! The total time spent working on this frame can be computed by doing + FRAME_COMPLETED - INTENDED_VSYNC. + </li> + </ul> + </li> + +</ul> + +<p> + You can use this data in different ways. One simple but useful visualization is a + histogram showing the distribution of frames times (FRAME_COMPLETED - INTENDED_VSYNC) in + different latency buckets, see figure below. This graph tells us at a glance that most + frames were very good - well below the 16ms deadline (depicted in red), but a few frames + were significantly over the deadline. We can look at changes in this histogram over time + to see wholesale shifts or new outliers being created. You can also graph input latency, + time spent in layout, or other similar interesting metrics based on the many timestamps + in the data. +</p> + +<img src="{@docRoot}preview/images/perf-test-framestats.png"> + + +<h3 id="timing-dump">Simple frame timing dump</h3> + +<p> + If <strong>Profile GPU rendering</strong> is set to <strong>In adb shell dumpsys gfxinfo</strong> + in Developer Options, the <code>adb shell dumpsys gfxinfo</code> command prints out timing + information for the most recent 120 frames, broken into a few different categories with + tab-separated-values. This data can be useful for indicating which parts of the drawing pipeline + may be slow at a high level. +</p> + +<p> + Similar to <a href="#fs-data-format">framestats</a> above, it's very + straightforward to paste it to your spreadsheet tool of choice, or collect and parse with + a script. The following graph shows a breakdown of where many frames produced by the app + were spending their time. +</p> + +<img src="{@docRoot}preview/images/perf-test-frame-latency.png"> + +<p> + The result of running gfxinfo, copying the output, pasting it into a spreadsheet + application, and graphing the data as stacked bars. +</p> + +<p> + Each vertical bar represents one frame of animation; its height represents the number of + milliseconds it took to compute that frame of animation. Each colored segment of the bar + represents a different stage of the rendering pipeline, so that you can see what parts of + your application may be creating a bottleneck. For more information on understanding the + rendering pipeline, and how to optimize for it, see the <a href= + "https://www.youtube.com/watch?v=we6poP0kw6E&index=27&list=PLWz5rJ2EKKc9CBxr3BVjPTPoDPLdPIFCE"> + Invalidations Layouts and Performance</a> video. +</p> + + +<h3 id="collection-window">Controlling the window of stat collection</h3> + +<p> + Both the framestats and simple frame timings gather data over a very short window - about + two seconds worth of rendering. In order to precisely control this window of time - for + example, to constrain the data to a particular animation - you can reset all counters, + and aggregate statistics gathered. +</p> + +<pre> +>adb shell dumpsys gfxinfo <PACKAGE_NAME> reset +</pre> + +<p> + This can also be used in conjunction with the dumping commands themselves to collect and + reset at a regular cadence, capturing less-than-two-second windows of frames + continuously. +</p> + + +<h3 id="diagnose">Diagnosing performance regressions</h3> + +<p> + Identification of regressions is a good first step to tracking down problems, and + maintaining high application health. However, dumpsys just identifies the existence and + relative severity of problems. You still need to diagnose the particular cause of the + performance problems, and find appropriate ways to fix them. For that, it’s highly + recommended to use the <a href="{@docRoot}tools/help/systrace.html">systrace</a> tool. +</p> + + +<h3 id="resources">Additional resources</h3> + +<p> + For more information on how Android’s rendering pipeline works, common problems that you + can find there, and how to fix them, some of the following resources may be useful to + you: +</p> + +<ul> + <li>Rendering Performance 101 + </li> + <li>Why 60fps? + </li> + <li>Android UI and the GPU + </li> + <li>Invalidations Layouts and performance + </li> + <li>Analyzing UI Performance with Systrace + </li> +</ul> + + +<h2 id="automate">Automating UI Perfomance Tests</h2> + +<p> + One approach to UI Performance testing is to simply have a human tester perform a set of + user operations on the target app, and either visually look for jank, or spend an very + large amount of time using a tool-driven approach to find it. But this manual approach is + fraught with peril - human ability to perceive frame rate changes varies tremendously, + and this is also time consuming, tedious, and error prone. +</p> + +<p> + A more efficient approach is to log and analyze key performance metrics from automated UI + tests. The Android M developer preview includes new logging capabilities which make it + easy to determine the amount and severity of jank in your application’s animations, and + that can be used to build a rigorous process to determine your current performance and + track future performance objectives. +</p> + +<p> + This article walks you through a recommended approach to using that data to automate your + performance testing. +</p> + +<p> + This is mostly broken down into two key actions. Firstly, identifying what you're + testing, and how you’re testing it. and Secondly, setting up, and maintaining an + automated testing environment. +</p> + + +<h3 id="ui-tests">Setting up UI tests</h3> + +<p> + Before you can get started with automated testing, it’s important to determine a few high + level decisions, in order to properly understand your test space, and needs you may have. +</p> + +<h4> + Identify key animations / flows to test +</h4> + +<p> + Remember that bad performance is most visible to users when it interrupts a smooth + animation. As such, when identifying what types of UI actions to test for, it’s useful to + focus on the key animations that users see most, or are most important to their + experience. For example, here are some common scenarios that may be useful to identify: +</p> + +<ul> + <li>Scrolling a primary ListView or RecyclerView + </li> + + <li>Animations during async wait cycles + </li> + + <li>Any animation that may have bitmap loading / manipulation in it + </li> + + <li>Animations including Alpha Blending + </li> + + <li>Custom View drawing with Canvas + </li> +</ul> + +<p> + Work with engineers, designers, and product managers on your team to prioritize these key + product animations for test coverage. +</p> + +<h4> + Define your future objectives and track against them +</h4> + +<p> + From a high-level, it may be critical to identify your specific performance goals, and + focus on writing tests, and collecting data around them. For example: +</p> + +<ul> + <li>Do you just want to begin tracking UI performance for the first time to learn more? + </li> + + <li>Do you want to prevent regressions that might be introduced in the future? + </li> + + <li>Are you at 90% of smooth frames today and want to get to 98% this quarter? + </li> + + <li>Are you at 98% smooth frames and don’t want to regress? + </li> + + <li>Is your goal to improve performance on low end devices? + </li> +</ul> + +<p> + In all of these cases, you’ll want historical tracking which shows performance across + multiple versions of your application. +</p> + +<h4> + Identify devices to test on +</h4> + +<p> + Application performance varies depending on the device it's running on. Some devices may + contain less memory, less powerful GPUs, or slower CPU chips. This means that animations + which may perform well on one set of hardware, may not on others, and worse, may be a + result of a bottleneck in a different part of the pipeline. So, to account for this + variation in what a user might see, pick a range of devices to execute tests on, both + current high end devices, low end devices, tablets, etc. Look for variation in CPU + performance, RAM, screen density, size, and so on. Tests that pass on a high end device + may fail on a low end device. +</p> + +<h4> + Basic frameworks for UI Testing +</h4> + +<p> + Tool suites like <a href= + "https://developer.android.com/tools/testing-support-library/index.html">UIAutomator</a>, + and <a href="https://code.google.com/p/android-test-kit/">Espresso</a> are built to help + automate the action of a user moving through your application. These are simple + frameworks which mimic user interaction with your device. To use these frameworks, you + effectively create unique scripts, which run through a set of user-actions, and play them + out on the device itself. +</p> + +<p> + By combining these automated tests, alongside <code>dumpsys gfxinfo</code> you can quickly + create a reproducible system that allows you to execute a test, and measure the + performance information of that particular condition. +</p> + + +<h3 id="automated-tests">Setting up automated UI testing</h3> + +<p> + Once you have the ability to execute a UI test, and a pipeline to gather the data from a + single test, the next important step is to embrace a framework which can execute that + test multiple times, across multiple devices, and aggregate the resulting performance + data for further analysis by your development team. +</p> + +<h4> + A framework for test automation +</h4> + +<p> + It’s worth noting that UI testing frameworks (like <a href= + "https://developer.android.com/tools/testing-support-library/index.html">UIAutomator</a>) + run on the target device/emulator directly. While performance gathering information done + by <em>dumpsys gfxinfo</em> is driven by a host machine, sending commands over ADB. To + help bridge the automation of these separate entities, <a href= + "{@docRoot}tools/help/monkeyrunner_concepts.html">MonkeyRunner</a> framework was + developed; A scripting system that runs on your host machine, which can issue commands to + a set of connected devices, as well as receive data from them. +</p> + +<p> + Building a set of scripts for proper Automation of UI Performance testing, at a minimum, + should be able to utilize monkeyRunner to accomplish the following tasks: +</p> + +<ul> + <li>Load & Launch a desired APK to a target device, devices, or emulator. + </li> + + <li>Launch a UIAutomator UI test, and allow it to be executed + </li> + + <li>Collect performance information through <em>dumpsys gfxinfo</em><em>.</em> + </li> + + <li>Aggregate information and display it back in a useful fashion to the developer. + </li> +</ul> + + +<h3 id="triage">Triaging and fixing observed problems</h3> + +<p> + Once problem patterns or regressions are identified, the next step is identifying and + applying the fix. If your automated test framework preserves precise timing breakdowns + for frames, it can help you scrutinize recent suspicious code/layout changes (in the case + of regression), or narrow down the part of the system you’re analyzing when you switch to + manual investigation. For manual investigation, <a href= + "{@docRoot}tools/help/systrace.html">systrace</a> is a great place to start, showing + precise timing information about every stage of the rendering pipeline, every thread and + core in the system, as well as any custom event markers you define. +</p> + +<h4> + Properly profiling temporal timings +</h4> + +<p> + It is important to note the difficulties in obtaining and measuring timings that come from + rendering performance. These numbers are, by nature, non deterministic, and often + fluctuate depending on the state of the system, amount of memory available, thermal + throttling, and the last time a sun flare hit your area of the earth. The point is that + you can run the same test, twice and get slightly different numbers that may be close to + each other, but not exact. +</p> + +<p> + Properly gathering and profiling data in this manner means running the same test, + multiple times, and accumulating the results as an average, or median value. (for the + sake of simplicity, let’s call this a ‘batch’) This gives you the rough approximation of + the performance of the test, while not needing exact timings. +</p> + +<p> + Batches can be used between code changes to see the relative impact of those changes on + performance. If the average frame rate for the pre-change Batch is larger than the + post-change batch, then you generally have an overall win wrt performance for that + particular change. +</p> + +<p> + This means that any Automated UI testing you do should take this concept into + consideration, and also account for any anomalies that might occur during a test. For + example, if your application performance suddenly dips, due to some device issue (that + isn’t caused by your application) then you may want to re-run the batch in order to get + less chaotic timings. +</p> + +<p> + So, how many times should you run a test, before the measurements become meaningful? 10 + times should be the minimum, with higher numbers like 50 or 100 yielding more accurate + results (of course, you’re now trading off time for accuracy) +</p> |