Optimizing and Profiling UI Performance

Building a Grid Layout With RecyclerView and Realm

Building an app with RecyclerView? Realm Java is an easy way to power a list or grid, and supports any type of data you can design. See how simple it is!

Performance matters if you want to be able to provide buttery smooth, exceptional user experiences. However, it can sometimes be difficult to track down and fix these issues. In this 360AnDev talk, I’ll explain what causes jank (dropped frames) to occur while scrolling, tips on how to avoid it and how to profile for problem areas if it happens.


Introduction (0:00)

When I first started my development career, I had no idea there were tools out there that could help me identify problems in my software. However, many of these tools seemed really difficult to use and they produced results I struggled to make sense of. When I was preparing this presentation, I kept those early struggles in the back of my mind. My hope is that anybody who has never had profiled their code before will have the confidence to do so after they read this.

Performant UI (0:39)

What does it mean having performant UI? Performant UI is achieved when response and frame rate is consistent and quick enough that the human brain perceives it as fluid motion without lag or stutter. It’s more complicated than that, but this sums it up into one oversimplified sentence. We’ve all experienced poor application performance where you’re going along and things seem okay, and then suddenly it starts to lag and might even come to a halt. Users can be forgiving of this if it’s a one-off that never happens, but if your application does this on a regular basis, be prepared for uninstalls and negative Play Store reviews.

On the other hand, don’t think your app has to be so perfect that it never drops a frame here or there. A software is never perfect. But your UI does have to operate smoothly enough that the brain can’t perceive much, if any, frame rate difference. If your frame rate suddenly drops lower than what it was previously for any reason, the user will experience what’s called “jank” – also known as dropped or skipped frames – which means unintentional pauses in motion. What’s really happening is the user sees the same frame repeatedly, causing a noticeable hiccup in the animation. To understand how jank happens, we will review a few terms first:

Frame rate or frames per second is the frequency per second the device’s hardware can draw to a buffer. It is the number of times per second that a new image can be produced;

• The refresh rate is the number of times per second a device’s display can update. It is measured in Hertz. One Hertz is one cycle. If a 60 Hertz display, we have a display that can update 60 times per second. Not all devices can render at 60 frames per second, though. When they can, there is no guarantee to get the exact number of frames. Even if you’re doing all the right things, the most important is staying within those VSync ticks.

VSync stands for Vertical Synchronization. It’s the pixel crossing guard and it operates like a heartbeat. VSync is a way to ensure that the GPU’s frame rate is in sync with the display’s refresh rate. If the frame rate is faster than the refresh rate, VSync throttles the frame rate so that you’ll only be delivered new images when the VSync pulse occurs and says, “Okay, go.”

VSync and Frame Buffering

In order to better understand the role VSync plays, we need a basic understanding of how frame buffering works. Modern systems utilize at minimum what is known as “double buffering”, where GPU composites the information it receives from the CPU and draws the image to a buffer frame. This buffer frame is commonly referred to as the “back buffer”.

When VSync occurs, the image drawn on the back buffer gets flipped over to the front buffer, and that becomes the image that the user sees on the display now. If for some reason, your GPU is still doing work on the back buffer, and it’s not sitting around waiting for VSync to occur so it can get swapped over to the front frame buffer, then that frame gets skipped and the image the user sees is that same image they saw before for that interval, or for however many VSync intervals, until that back buffer is done being drawn and we’ll see a new image on the next VSync occurrence.

Android utilizes triple buffering in Jelly Bean and up. It operates much like double buffering, except that we have potentially two back buffers to work with instead of one. In order to get an image on the screen, the CPU’s responsible for updating a display list. That gets handed off to the GPU to draw to a buffer. The GPU is also responsible for swapping these buffers so that the newly drawn image can be displayed when VSync says, “Go.”

We already know that if the GPU isn’t ready when VSync pulse happens, then a frame is dropped and the same frame is shown again. Since the entire rendering pipeline has to occur between VSync boundaries, if we drop a frame and rendering crosses over that 16-millisecond threshold, then another frame buffer is allocated so that we can immediately start rendering the next frame without having to wait for that back buffer to become available.

You can see an example on slide 9, where we have frames 0, 1, and 2 displaying on the screen when a VSync pulse happens. However, frame 2 takes too long to render and the GPU isn’t done drawing to the back buffer. At this point, we’ve skipped a frame and a third buffer is allocated, and we can start drawing to it immediately. Even though we already skipped a frame, hopefully now we can catch up and only drop one frame instead of always being behind and struggling to catch up.

What happens if VSync is turned off? If you play any games that benefit from very fast frame rates, such as real time first person shooter games, you may have gone into your settings and turned off VSync. What this effectively does is it disables that heartbeat that syncs up your GPU’s buffer swap to your screen’s refresh interval. The advantage is frames per second (FPS) is no longer capped at the display’s refresh rate. The disadvantage now is that there’s nothing to enforce when a buffer swap happens. A swap can happen at any time. This means you could be in the middle of drawing a new frame and when a buffer swap happens, you end up with what’s known as “screen tearing”. That’s where you see part of the old frame and part of the new frame.

Now that we know the basic causes of jank, how do we avoid them?

Optimize (7:35)

One of the best ways to avoid jank is not doing things that will knowingly contribute to rendering time, which might push those VSync boundaries. When optimizing the UI, one of the first things you should look at is your view hierarchy. You want to have a shallow view tree and avoid deep nesting.

Flatten View Hierarchy

On slide 13, you can see an example of a deeply nested layout. There are a lot of useless ViewGroup parents in this that are serving zero purposes, but have a negative impact on the performance by contributing to the measure and layout time required to render this list route. Notice here at the top, on slide 14, that the layout editor shows me a list of color coordinated parent containers that wrap the selected ImageView. This helps to see the depth of the selected child view and where the container views begin and end.

The ImageView that the red arrow is pointing to is wrapped with one RelativeLayout and then three LinearLayout’s. That’s completely unnecessary. This XML achieves the exact same results as the one on slide 13, but with a very shallow layout:

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
  android:layout_width="match_parent"
  android:layout_height="wrap_content"
  xmlns:tools="http://schemas.android.com/tools"
  android:paddingLeft="16dp"
  android:paddingRight="16dp"
  android:orientation="horizontal">
  
  <ImageView...>
  
  <TextView...>

</LinearLayout>

You can see a side-by-side comparison of the design view screenshots for both layouts on slide 16. The optimized layout on the right looks no different than the one on the deeply nested on the left. But since this is a list row, our mistakes in the deeply nested layout get multiplied for every row displayed on the screen.

Hierarchy Viewer

Hierarchy Viewer is a great tool to give you an overview of your tree structure. It helps to quickly spot costly mistakes that contribute to rendering time. To use Hierarchy Viewer, you need to follow the setup instructions on developer.android.com. If you’re using an emulator, then you’re already good to go.

You can see some screenshots on slide 18 on how to obtain a capture. First, click on the Android icon to launch the DDMS Android device monitor. Then click on Hierarchy Viewer perspective if you’re not already on it, and select your process on the windows tab on the left. You can either double click it or click on the miniature blue tree icon to obtain a snapshot.

On slide 19, you can see what a capture looks like. That is the tree overview of that deeply nested layout that I showed earlier. In comparison, you can see on slide 20 the tree for the flattened version of the layout. You can see there’s a lot less going on.

Hierarchy Viewer also allows the option to get the relative measure layout and draw times for each view or ViewGroup. You can see that by clicking on the icon with the three circles in the upper right-hand corner, and the view you select will show how many milliseconds it took for a measure, layout, and draw.

Then also you’ll notice that there are three dots that show up on each child view, giving you a quick visual of rendering performance, or what Hierarchy Viewer thinks it was. Clicking on the child view will give you the same detail view that you saw in its parent. However, I learned from a very reliable source that these times are a lie, so I wouldn’t rely on this feature at all.

As you can see on slide 22, when I click on the RecyclerView node, there are 49 views; 48 child views plus the RecyclerView itself. You can see how mistakes quickly add up. On slide 23 we have that flat layout. Now there are only 13 views, and some of the views were eliminated just by converting that text and heart icon into a single compound drawable. The rest were eliminated by removing the extra depth created from the ViewGroups that weren’t serving any purpose. Again, I can’t stress how important it is that because it’s a RecyclerView row, every mistake or, conversely, optimization is multiplied by the number of rows that are inflated.

Know Your Views (12:19)

It’s important to choose the right view for the right job and know the pros and cons. For example, realize you may have greater control over relative layout, but understand that that comes with an extra measure in the layout. The same goes with linear layout regarding the weight attribute. That extra flexibility comes to you at a cost.

My deeply nested layout example, I know it seems completely contrived in this context, but it’s really easy to end up with a view tree that looks very similar to that when you start reusing layouts with the includes tag. If you run Hierarchy Viewer, you’d quickly spot that mistake. That might not have been so obvious until you saw the graph. If all you needed was a root ViewGroup, you can remove those extra parents that are wasting time and space by swapping the root with the merge tag instead.

The point is that understanding what you’re putting on the screen and ••how you’re doing it is critical**. To learn more about thought layouts and choosing the right view for the right job, if you haven’t seen Huyen Dao’s talk on loving lean layouts, look for it on YouTube because it’s pretty great.

GPU Overdraw

GPU overdraw is a tool in developer settings that provides color-coded visual feedback to help you avoid drawing things that won’t be seen on the screen. When pixels are drawn over top of pixels, this is called “overdraw”. In order to enable this feature:

  • Go into the developer settings.
  • Scroll down to “Debug GPU overdraw”.
  • Choose “Show overdraw areas”.

One other feature is an accessibility feature for colorblindness. Once you’ve enabled it, you’ll see the immediate effects of it.

In the video, scrolling through our sample app, or on slide 28, you can see a lot of colors. Especially shades of red, which are bad. You want to avoid red as much as possible. Light blue represents one layer of overdraw or one layer beneath the layer that’s visible. Light green is two layers. Light red is three times overdraw, so there’re three layers underneath the layer you’re actually seeing. Dark red is four or more times overdraw. You definitely want to look at what you’re doing in that area. You can’t completely avoid overdraw, but you want to minimize it as much as possible. Also if there’s no color, it means that there’s no overdraw occurring at all.

In my deeply nested layout, I was setting white backgrounds on all of the views. I don’t need to do this for every single view. The white background is already defined in the app’s theme, so the window will already have a white background. Even though this seems really obvious, extra backgrounds are a really common cause of overdraw. By removing all of the white backgrounds in the views that didn’t need them we removed a lot of the overdraw.

Also on slide 28, you may notice that the upper part of the cat images are darker than the lower part. That’s indicating an extra layer of overdraw. That’s because in my XML’s image view I was setting the source drawable for design preview purposes.

If you plan to set the image in your Java code and you’re only setting the drawable in XML simply for a preview, use the tool named Space instead of Android. The Android system helps us out with overdraw by not drawing views that are completely hidden, but this won’t work for views that are partially showing or custom views that override onDraw.

For custom views that override the onDraw method, there’s clipRect and quickReject. Both are part of the canvas API. Calling clipRect in your onDraw allows you to define the drawable boundaries for a view. Anything that falls outside of this area will not be drawn, and quickReject returns true if the area is completely outside of the clipping rectangle.

Profile GPU rendering

You’ve optimized as much as possible and you’re still experiencing jank. Now, what? Profile GPU rendering is a great first place to get a holistic overview of UI rendering performance. To enable profile GPU rendering, go into: Settings, Developer Options, and then select one of the two profile GPU rendering options. The first is “On screen as bars”, and the second can be executed via command line for text output, or enabled here in developer settings to return graphical output to Android studio monitors.

Once you enable “On screen as bars”, you’ll see them appear immediately (see slide 33). A vertical bar represents the entire rendering pipeline, and each color is a phase of it.

  • The green horizontal bar is the 16-millisecond reference bar. If your vertical bar goes above that line, then things took too long. The orange section of the bar represents the time that the CPU is waiting on the GPU to finish its work.

  • If this orange section gets tall, then that means too much work is happening on the GPU and the CPU is sitting around idle.

  • The red section of the bar represents the time spent by Android’s 2D renderer issuing commands to OpenGL to draw the display list.

  • The blue and purple sections are only present on Android 4 and up. The purple section, the bar, is the time spent transferring resources to the render thread. The blue section is the time spent creating and updating the display list. If your bar is tall in the blue section area, you’re likely doing too much work in your onDraw method.

On Marshmallow, the colors have been updated again.

There’s not really any documentation that I could find for that, but at least now there are labels next to it if you’re looking in the Android studio monitor. They still correspond to the previous colors, but we have more granularity now in shades of green for things like measure and layout, animation, and input handling.

On slide 34 you can see a side-by-side screenshot with Profile GPU Rendering on screen enabled, running on a Marshmallow device. On the left is the deeply nested, and on the right is the optimized layout. Both are well above that 16 millisecond reference bar. But you can see here that by just eliminating the depth by the extra views, or view groups in the hierarchy, we’ve significantly improved performance.

Android Studio Monitors (19:41)

Android studio has built-in monitors to show the performance in areas of network, GPU, CPU, and memory. Much like Profile GPU Rendering on screen as bars, it displays the GPU rendering pipeline in colored sections and shows the green 16-millisecond reference bar. I find it easier to differentiate between the green colors using a monitor, plus you can zoom in on it. But it’s a touch less convenient than the onscreen version.

In order to run any of the monitors built into Android studio, turn on “USB Debugging” and set the debuggable flag in your gradle build file:

android {
  buildTypes {
    debug {
      debuggable true
    }
    release {
      ...
    }
  }
}

To capture GPU stats, go back to Profile GPU Rendering in the developer options, and enable “In adb shell dumpsys gfxinfo”. Lastly, if you’re running on Android five or later, you’ll need to turn on “Enable view attribute inspection” in developer settings. If you enable everything and you’re still not getting output, make sure the DDMS device monitor is not running.

You can see an example on slide 39 displaying memory, CPU, and GPU monitors running. The memory monitor at the top gives a quick view of your memory pressure and roughly when garbage collection occurs. The dark blue at the top is the allocated space, and the light blue is the free space. Wherever you see it come up and then it just drops off, that’s when the garbage collection happened.

Systrace (21:44)

Systrace is a powerful tool for profiling UI performance. You can run it from Android studio or via command line. I prefer the command line, so that’s what I’m going to show you.

$ cd android-sdk/platform-tools/systrace
$ python systrace.py --time=10 -o mytrace.html sched gfx view wm

First, I change directories to Android SDK platform tools. Then I ran this systrace Python script. I set the number of seconds I wanted to trace to collect date to 10 seconds. I specified that I want the results output to a file named mytrace by using the -o tag. After that, I specified the options I wanted information about. Schedule, graphics view, and window manager. You can see a full list at developer.android.com, because there’s a lot more things that you can use this trace to profile on.

On slide 42, you can see an example output that was produced from the flattened layout with GPU overdraw removed. This is only showing the running apps process. The rest of it, I cropped out. When Systrace is running, it isn’t just collecting a trace log of your app. It’s also collecting system activity and other app events, so you want to kill other running apps before starting your trace. On slide 43, you can see a close up of a single frame. That is what a happy frame looks like, where process, render, and update took less than 16 milliseconds.

Most of the time here was spent on the RenderThread, issuing draw commands, as can be seen by that light green bar labeled DrawFrame. The RenderThread will only be present on Android five and up. To navigate Systrace output, you can use ASDW on your keyboard, kind of like playing a game. A and D, left and right. W is to zoom in and S to zoom out. You can also use M to highlight a section or an entire frame.

If you don’t see the purple vertical bars, on my display they show up purple but here they’re showing as blue. If you want them to display, then you need to click on the view options and then click on highlight Vsync in the upper right-hand corner (see slide 45).

You’ll see each frame represented as a colored dot; either green, yellow, or red (see slide 46):

• Green is good. It means that process, rendering, and update with the new frame were all achieved in 16 milliseconds or less; • Yellow means that you spent a little bit too much time, but you weren’t too far past that 16-millisecond mark. You maybe only dropped one frame; • Red means that performance was poor and you took far too much time and might have dropped multiple frames as a result.

On slide 47, we have a highlighted area, and then I clicked on the section Draw on the UI Thread, and then hit the M key to mark it. By clicking this section of the pipeline, Systrace will provide additional information about it. I can also hold the Mac command key down to select multiple slices for inspection.

Despite optimizing our layout and removing overdraw, we still have some performance problems, as can be seen by the red dot on slide 48. The issue comes from the fact that the images are rather large and texture uploads to the GPU are expensive. Since these images are much larger than the images are actually drawn on the screen, a good optimization would be to resample these images so that we’re pushing fewer pixels through the pipeline.

Systrace tries to help us out by alerting us to issues. On slide 49, you can see the alert says, “Expensive bitmap uploads”, and it provides us with more information about the problem. I already knew what the problem was because I intentionally created it for the purpose of this talk. This alert is definitely on target.

This information, even though it’s generic, can sometimes prove useful to point you in the right direction and help you understand what part of the pipeline you need to focus on. You can see these alerts at the top of your Systrace report, as well. There is a row labeled Alerts, or you can click on one of the red or yellow frame dots to see the alert for it.

Conclusion (26:27)

In this talk, I didn’t cover other extremely important items: scaling, sampling, caching of bitmaps, threading, network performance, or memory management. They will affect your app’s UI performance and you can certainly write your own code to handle all of these things. Or you can lean on a trusted image library where these things have been taken into consideration before. There’s Glide, Fresco, and Picasso, to name a few.

But having said that, using an image library is not a silver bullet and all of your problems are solved suddenly. If you continue experiencing performance issues, it’s time to start profiling. My goal was inspiring you to profile and optimize your app. I hope that if you haven’t done that already, you will do it today.

Resources


Brenda Cook

Brenda Cook

Brenda is an Android Software Engineer currently working as a consultant. She is a former Network Engineer specializing in Cisco Networking and Security Appliances who made the leap into software in 2011. Since then, she has been developing for Android for close to five years and dabbles in other technologies including IoT. As someone who has always been passionate and excited about technology, she tries to spread that excitement and enthusiasm to others through volunteer teaching in high schools, speaking engagements and Meetup participation.