In this talk from 360|AnDev, Romain and Chet share information and techniques for making better UIs.
Introduction (0:00)
Today, we’re going to talk about our obsessions. We have two parts to the talk, split right in the middle. There’s the Chet-taught part, who used to write a lot of code about graphics, animation and performance, and now manages the Android UI Toolkit team. And then there’s the Romain part, who manages the Android Graphics team. On the first part, we’re gonna talk about animation rules, and on the second part we’ll talk about color, and color spaces.
Animation rules (1:45)
First, let’s talk about animation rules. There’s a book by Disney called The Illusion of Life. I would recommend that you read it. In particular, there’s a chapter called the Principles of Animation, the 12 Rules of Animation. It’s a coffee table book about the size of a coffee table. It’s a really good description of the rules and principles that Disney applied to animations when they created all their movies and shorts years and years and years ago.
A lot of those principles apply today across animations, but also across life, and also across user interfaces. We’ve actually talked about this before. In Google I/O in 2013, we had a talk where we talked about a lot of animation principles, including cartoon animation principles. If you’ve already seen that one, you might not know that we also gave that talk four years before that at Devoxx. We went over all the 12 rules, so we’re not going to regurgitate all of that stuff.
Today I wanted to focus on the principles that apply directly to stuff that you can see on the Android platform, which you should apply to your animations when you’re creating them or using them on the system. Some of the rules, since they were created for hand-drawn animations, don’t apply as easily, and I’ll go over the bulk of those rules at the end and cover the ideas behind them. I did want to spend some time on the ones that you can actually see in the platform, especially since material design, and especially since the animation capabilities in the platform have made a lot of this stuff a lot easier than it used to be a few years ago.
Staging (3:27)
Let’s start out with staging, which is at the heart of a lot of the principles here.
The idea is to connect the user or the viewer to the action that you want them to understand happening on the screen at the time. You can see this in animations and movies as well. If you notice, in a lot of movies, you have a character that wears distinctive clothing, so that when they show a crowd shot, you know that the guy in the green hat is the main character and that you’re supposed to follow their actions. There may be 300 extras on the screen, but you know where the main character is. I think they also do this so that when there’s a stunt double, they can fool you into thinking that’s actually the actor.
But there are reasons to keep your eye focused on that person because there’s so much going on that if you’re just presented with the chaos of a city scene, you’re not going to know what it is you’re supposed to pay attention to and glean from that scene. So they give you hints. They’ll focus on that character and zoom out. They’ll do other things to make sure that you understand what you’re supposed to be watching.
The same thing occurs in animation. You have a very short amount of time to get what they want to impart. This character is going to run over there, or this character is doing something secretive in the corner of the room. They want to draw your attention to that character so that you get what you need to out of that scene so that you understand the follow-on actions that are going to happen in the cartoon.
We have a simple diagram in our presentation. There are a lot of objects on the screen, this massive scene with a bunch of stuff, and then somewhere in that scene is going to be an object that will draw your attention with animation, color, something that distinguishes itself from everything around it. I wanted to show a couple of examples that you can see on the Android platform, so you get a little better sense of how this is integrated into some applications today.
The first one is from Play Music. Play Music was one of the first apps to do a lot of activity transitions. They worked really hard to integrate the idea of launching from one activity to the other with shared elements because they had this immersive experience in which you are dealing with the same media over and over again. You’re going from a list of the songs on an album into descriptions of the songs, or back out to a list of albums, and there are these common elements in between these activities. They want to make sure to bring the user along with that experience. When you tap on the album, you have this ripple animation, and then it launches. What they’re doing is taking you from one activity to the other, but instead of just erasing the screen and painting the new one in place, and then your brain has to parse the new information, they’re helping you focus on the key element, which is the album. That will take you from one screen to the other, so that you know that this is detailed information about that specific album.
We can also see this in some of the new launch animations. We tap on the icon and then we launch, roughly from the icon with an icon in the middle of the window background. So that you know this isn’t just a random blank window coming out of the screen, or you’re not presented with, automatically, the full UI of this app that you went into. Instead, it helps bring you there by showing you the icon along the way.
Android development tip: just use a Drawable
for your window background. Don’t do some fancy screen, like a new launch animation activity thing that has to be loaded on its own. Use drawables for window backgrounds. They’re very efficient, and they get the job done without a lot of overhead.
Slow in/out (7:42)
Let’s go to the next rule, slow in/out. This is about movement. In particular, it’s about timing, and more natural timing for the user to understand because the animations were about mimicking life. It’s about characters that you want to imbue with life so that the audience understands them and has some sympathy and empathy with these characters, and not just these animated things happening on the screen. It’s not just a sequence of frames, it’s actual characters, lifelike characters on the screen. You want the motion to impart the feeling that these are real beings on the screen.
Again, we have a simple animation in our presentation. We have this nonlinear thing on the top. You’re easing in, and you’re easing out. You’re accelerating into the motion, and then you’re decelerating out of it. On the bottom, you get this linear motion. You can think of the one on the top as being more natural because this is how we as human beings move, right?
If you have a computer, if you have a robot moving, they may be moving linearly. If we see an animation on the screen, that’s what we think. If it’s moving linearly, it looks mechanical to us because living beings don’t move that way. We accelerate into motions and decelerate out of them. The takeaway from the talk is that the one on the top is better, and the linear motion is, in fact, bad.
There are exceptions to this rule, and one of them is when you’re fading something, when you animate the opacity of a view, you don’t necessarily need the ease-in and ease-out. You can use a linear interpolator. There are other situations where you want linear motion. For instance, when you have huge objects onscreen. Sometimes they look better with linear motion. Don’t take that as a rule that you have to apply no matter what. Make sure to try it and see if it makes sense for the current animation.
The last point is that in VR, linear motion is something you want. When you have a first-person point of view in a VR environment, if you have an ease-in-ease-out motion for the camera, it feels pretty horrible. It makes me sick because your body doesn’t feel that acceleration and deceleration. So in that situation, you do want a linear motion. It doesn’t look as nice, but it doesn’t make you puke, which is a pretty good feature.
Something interesting came up during the development of Lollipop, where we were doing a lot with the UX team that involved developing some of the animation principles behind material design. They use the language of easing exactly the opposite of the way that we do. I believe theirs comes from the more traditional stuff. I think if I went back and read the Disney stuff again, it would probably match their worldview, which we think of as being wrong.
When we say ease-in-ease-out, the motion that I showed, with the ball moving from the left to the right, it accelerated in and it decelerated out. We call that ease-in-ease-out. It’s easing into the interval, and then it’s easing out of the interval. I don’t know if people have dealt with a flash platform and the Penner easing equations. It’s a common language in programming animations. It’s been around for years with software developers. In the meantime, the designers, for some reason, haven’t been paying attention to us, and they use this language to mean the opposite thing. If they say ease in, they actually mean easing into the pose at the end of the interval, and if they say ease out, they mean easing out of the pose at the beginning. When they say ease-in-ease-out, they’re talking about easing out of the pose at the beginning, and then easing into the one at the end. They mean decelerating out and then accelerate. It warps my mind every time I think about it. They’re wrong, but if you ever get into a confusing conversation with your designer, you might just try pointing out to them that they’re wrong.
Sometimes it’s confusing for other reasons. I remember working with UX designers and they had this beautiful interpolation. Then I looked at the duration of the animation. I did the math and realized there were three or four frames in the entire animation. I tried to convince them that it didn’t matter if we tried to accelerate or decelerate, because when you have three frames, there is no way you can see the acceleration and the deceleration. That’s another reason for you to use linear interpolation. If the animation is really, really fast, don’t even bother.
Fortunately, we make it easier for you to have nonlinear timing or any timing that you want. I want to run a quick demo in our presentation and show how this might work. We have a linear interpolation. Somebody talked to me at Google I/O, and he said, “I have this animation that I want to run with a text element that I want to slide onto the screen. What’s the best kind of interpolation that I should use? What is the correct thing?” And there is no correct thing. The answer is always, unfortunately, it depends. It depends on your situation. It depends on the feel of your application. It depends on what you personally want or what your users would feel is more natural in the context of your application. I suggested that what he should actually do is to write a demo application and play with the different interpolators. Play with the duration. Play with the different timing curves that we have and come up with the one that made sense for his context. Then I realized that maybe we should make that kind of thing available. It’s not actually that hard to write a very limited demo.
What you can see in the demo is what you would expect. We’ve selected a linear interpolator. We’re going to run the animation, and then you’re going to see the animation move in a couple of different ways. It’s going to move along the curve that we’ve drawn on the graph below, and then there’s a couple of random elements on the bottom so you can see what the motion is like, moving left to right, or top to bottom. You can run that a couple of times. You can change the duration. You can do repeating. If you want to get a feel for it over time, you can have it run over and over, but let’s look at the more interesting curves. We can go to the decelerate curve, and you can see the timing representation on the curve on the graph. You can run that and see that it starts out pretty fast and then it decelerates over time. The factor here is one of the parameters that you can use in the constructors. We can change that, see the effect that it has on the timing curve and then run the animation again and get a feel for that motion.
Bounce is kind of cool. You probably don’t want to use this one in general, but just play with it because it’s fun. We have path interpolator which is a very general thing that you can use. You can supply an arbitrary path and get all kinds of kooky behavior. We also have a couple of ways of creating canonical, quadratic, and cubic paths. You can change the curve by dragging the control points and get a feel for that timing. Using the path facility you can reproduce all the other ones that you want. The path interpolator that we introduced in Lollipop was meant to be a more general purpose interpolator from which you could probably get any kind of curve.
We’ll go over to Cubic. This one has a couple of inflection points. You can get more complicated, especially if you drag it to the wrong place. The code is not very interesting. They’re basically different constructors for the interpolators, and then the interpolators just return the floating point value over time.
Arc (16:47)
Arc is related to the timing. We don’t move in a linear fashion in time or space. We don’t generally follow a straight line everywhere. Instead, we move generally nonlinearly. Wouldn’t it be nice if our objects on the screen did the same? Again, to avoid a mechanical feel.
If you’re moving an object from one corner of the screen to the other, it should follow some subtle path along the way. It just looks more organic and natural. So instead of following a straight line, we want to do something more like this. Again, a quick demo.
We can see three different kinds of motion in the demo. We have linear motion where the button is moving from left top to bottom right. We can move it either linearly or with path motion. There are a couple of different ways to construct the path motion. These last two look remarkably similar, but the codes for creating them are very different, and I wanted to show both ways of creating this. The one in the middle is using ObjectAnimators, manually creating a path and then using ObjectAnimator to animate along in that path. The last one is doing a transition, less code for doing pretty much the same thing.
final float oldX = arcMotionButton.getX();
final float oldY = arcMotionButton.getY();
The first step in doing any of these animations is you want to figure out where the button is now.
LayoutParams params = (LayoutParams) arcMotionButton.getLayoutParams();
if (mTopLeft) {
params.rightToRight = R.id.parentContainer;
params.bottomToBottom = R.id.parentContainer;
params.leftToLeft = -1;
params.topToBottom = -1;
} else {
params.leftToLeft = R.id.parentContainer;
params.topToBottom = R.id.trajectoryGroup;
params.rightToRight = -1;
params.bottomToBottom = -1;
}
arcMotionButton.setLayoutParams(params);
mTopLeft = !mTopLeft;
Get the current position, the X and Y, and then re-position the views. Then change the layout. In this case, I’m using the new ConstraintLayout
, so I’m setting the button to be either anchored to the left-top or bottom-right. Then I set out the layout parameters, which is going to cause a requestLayout
. So we’re going to come around later and do a layout, and when that happens, we want to PreDrawListener
.
final ViewTreeObserver observer = arcMotionButton.getViewTreeObserver();
observer.addOnPreDrawListener(
new ViewTreeObserver.OnPreDrawListener() {
@Override
public boolean onPreDraw() {
observer.removeOnPreDrawListener(this);
// ...
return true;
}
}
);
This is a common animation technique. It’s in a lot of the stuff that we showed before. It’s also at the heart of how transitions work. We grab the view tree observer. We add a PreDrawListener
to it, and then, in the PreDrawListener
, we know the layout has happened. Now we can do the magic and figure out where it is at the end state, and then animate from the beginning to the end.
final ViewTreeObserver observer = arcMotionButton.getViewTreeObserver();
observer.addOnPreDrawListener(
new ViewTreeObserver.OnPreDrawListener() {
@Override
public boolean onPreDraw() {
observer.removeOnPreDrawListener(this);
float deltaX = arcMotionButton.getX() - oldX;
float deltaY = arcMotionButton.getY() - oldY;
PropertyValuesHolder pvhX = PropertyValuesHolder.ofFloat("translationX", -deltaX, 0);
PropertyValuesHolder pvhY = PropertyValuesHolder.ofFloat("translationY", -deltaY, 0);
ObjectAnimator.ofPropertyValuesHolder(arcMotionButton,
pvhX, pvhY).start();
return true;
}
}
);
Here we have the linear approach, where we find out the new position:
- We figure out the ΔX and ΔY. Where did the button move to?
- We set up
ObjectAnimator
to animate the two properties in parallel. We have onePropertyValuesHolder
for X and another for Y. - We set up the
ObjectAnimator
to do both of those in parallel.
We start the animation, and everything happens, and it moves in a boring straight line down to the corner. What you want to do is make that curved instead. You can do this using the same stuff you saw before except with that ΔX and ΔY, we now create a path motion instead. We’re going to create a path with that information, and we’re going to create this with a couple of control points. We move to the upper-top, and then we do a quadratic that specifies the single control point that’s going to be used to create the curve in between them. There’s going to be a control point in the middle but offset from that line, which is going to cause the curved motion. Start the animation, and it animates along the path, pretty much the way you want it to. An easier way to do this is using transitions.
First of all, you don’t even need to know the old position, because the transition is going to figure that out for you. Step one is re-position the view the same way we did before. Set the layout parameters appropriately, cause or request layouts, and then beginDelayedTransition
. The transition manager is then going to set up that PreDrawListener
on your behalf, figure out where things were and where things are, and then run an animation.
ChangeBounds arcTransition = new ChangeBounds();
arcTransition.setPathMotion(new ArcMotion());
TransitionManager.beginDelayedTransition(parentContainer, arcTransition);
By default, it’s going to be linear, but you can change that easily by using an ArcMotion
. In the ChangeBounds
transition that you’re going to use, you can say, use an ArcMotion
. It’s going to automatically figure out, for the path that it took, an appropriate curve for it to follow. The code on here is everything you need to run a curved transition.
Secondary motion (21:17)
Secondary motion is an animation that helps call out the overall motion of some other animation. A simple diagram of this is, we have this bouncing ball over to the left, and you’ll notice the one on the right is doing something else that emphasizes the motion. As soon as the one on the left hits the bottom, then there’s a pulse that just helps emphasize the overall feeling of what the animation is.
We can see this animation in the UI. This is the same, somewhat janky, screen record animation that we had before. In particular, I want to call your attention to the Play button that animates in. You’ve enlarged the album coming into the second activity, and then we animate in the Play button and the icon of the user over on the left. It’s these animations working in conjunction that show you that you are now on this album, and you have the option of playing the album at the same time.
Secondary motion is a great technique, but be careful. It’s pretty easy to start using it all over the place. Play Music does it right. They use it on only one element. But I’ve seen applications, and maybe we’ve been guilty of that at Google, where so many items play secondary animations that it’s overwhelming for the user. Try to focus it on one or two key elements of your UI. Don’t sprinkle it around just because it’s fun to add more animations. If you’re trying to use a lot of animations, things may get noisy on the screen as they overlap. Curved animation, besides making things more organic, can also be an effective way to ensure that items are not colliding or overlapping too much on the screen.
I want to show one more quick example. This is in the notification shade. If you pull down the notification shade, you’ll notice the gear icon at the top is a nice secondary animation. As you’re dragging it down, the gear icon will fade in, and it’ll turn at the time, drawing your attention to this settings thing, and emphasizing the expanding nature of the notification panel that you’re going into. There’re lots of other secondary animations going on in here as well, but the gear icon is the key one that I like there.
Timing and solid drawing (23:23)
A lot of animation is about timing. This specific principle is about using timing to convey a sense of reality, a sense of physicality to that object.
If we have a tiny object that’s moving a very short distance, it should move very quickly. You want to convey the sense of this as small and light. On the other hand, if you have an object that is moving a great distance, then it’s appropriate for it to take longer. Because it’s a physical object, it should take longer to get there. On the other hand, if you have a much bigger object, and you want to convey a sense of that, animations should usually be as quick as possible. But if I want to communicate that this object has weight to it, then the duration should help convey that. It should take longer to get there because huge things take longer.
As you work on this kind of animations, you’ll notice that it’s all about the viewer perception. It can be very interesting when you have big transitions where you’re fading full-screen elements. For instance, you might want to use a shorter duration because the object is so big visually that it will feel like the animation is taking longer than it actually takes. Again, if you implement something that big, something as big as that circle in one of your applications, you’ll start playing with the timings and see what I’m talking about.
Solid drawing is about conveying a sense of physicality. I originally thought the principle was about having solid drawing skills. That’s not it at all. It’s conveying a sense of solidity to your objects, a sense of physical reality to make sure that the users understand and empathize with the stuff going on onscreen as being real beings.
Here’s an example from material design spec, and you can see examples in the UI as well. This is part of the whole reason behind the shadows. The sense of reality that we wanted with the cards or these paper objects on the screen is that you give them elevation, and they’ll automatically then have a shadow associated with them that helps you understand the physical nature of them.
Other cartoon animation rules (25:43)
I want to go quickly through the rest of the rules, which don’t apply as easily to UIs.
Squash and stretch are about physical nature. As things fall, maybe they lengthen. Gravity is pulling on them. It’s cool, but not terribly useful in UIs because it’s a bit too cartoony. Surprisingly, this is something that does happen in real life. If you’ve never seen a slow motion video of a golf ball hitting a wall or tennis ball hitting a wall, you should look that up on YouTube. You’ll see it’s pretty impressive how much they squash and stretch as they hit a wall. We don’t see it because it goes really fast in real life, but it happens.
The next one is anticipation. It’s very related to staging where you want to help the user understand what’s going on on the screen. You have very few frames in which to do this, especially in old traditional animations. If your character is going to dash from the right to the left, and they first rears back slowly, then we know that this is an anticipation maneuver where they’re going to run off in the opposite direction. Again, this is crucial for animation, and not necessarily applicable to a lot of UI stuff.
Straight ahead, and pose to pose. These are about the difference. It helps create a sense of frenetic energy in animations. The traditional way of animating things is you have these key frames. They’ll draw these major poses, and then some junior animator will come in and waste their time drawing the stuff in between. What they can do, instead, to convey the sense of energy, is draw every single frame individually. Not as much as the pose to pose. Then it just creates this extra energy because of all the noise going on between them. Again, not terribly related to UIs.
Followthrough and overlap are about physical objects. If you hit a wall, your bones are going to stop immediately, but the flesh on your body is going to continue. The golf ball is also a good example of this. Some parts of that object may stop at the hard wall, but the parts that weren’t constrained are going to continue. That’s the follow through that helps the user understand that this is a real physical being. Again, kind of cartoony for UIs.
Exaggeration. This is terribly useful in cartoons where part of the idea is to have fun. You want it physical, but you also want it to be surreal, more than real. Again, nice in cartoons, but not necessarily something that we want in our UIs.
Appeal. Wouldn’t it be nice if people actually had empathy with your characters? You want to make them appealing. Give them charisma. This is certainly true of your UIs. You want your UIs to be appealing.
Colors (28:56)
Let’s talk about colors. There are two reasons why I wanted to talk about this.
First, it’s my current obsession. There’s a point that many apps, most applications get wrong. Even some very fancy applications like Photoshop on your desktop sometimes won’t get it right. I wanted to talk a bit about this to help you fix your applications if it actually matters to you.
I also wanted to talk about color spaces because over the past couple of years we’ve seen the introduction of wide gamma displays. We have the new specifications for UltraHD TV, such as 4K and 8K displays. They have really large color spaces. We’re trying to talk about HDR. Apple has started shipping wide gamma displays with their iMacs. The chances are that in the next few years, we’re going to see this kind of technology reach mobile phones. Then you’ll have to start worrying about color spaces. This is going to be a gentle introduction to this incredibly complex problem.
The first point is the issue of gamma versus linear spaces. To be clear, I’m using terminology in this talk that’s a gross simplification of the actual color science. If there is a color scientist in the room, you probably will be shocked and hate me for what I’m going to say, but I want to keep things simple. The key takeaway of gamma versus linear is that you are doing it wrong in your application. To understand why you’re doing it wrong, we have to go back all the way to the early beginnings of computer science, with CRT monitors.
How a CRT monitors work is there is an electron gun, which sounds really cool. It’s actually pretty boring in practice. It fires a bunch of electrons at the phosphorescent screen, and there’s a mask that creates the RGB values.
To better understand exactly what happens in the CRT monitors, let’s imagine we want to display a gradient. This is the input to our monitors. The horizontal axis is the pixel coordinates, and the vertical axis is the color. This is a black to white gradient. We’re sending that curve to the monitor and saying, “Please display my beautiful black to white gradient.” The equation for that is X
. What the monitor will actually do is display this curve, called the gamma curve, and it’s X
raised to the power of 2.2. The side effect of that is that your beautiful black to white gradient is now darker than when you intended it to be.
You didn’t do anything wrong. You wrote your app. You did the thing that makes sense, and yet it’s going to look dark onscreen. This happens because of the way electron guns work. It’s physics. We can’t change it. Physics is annoying sometimes, and this is one of the situations where it is.
What we have to do is correct for the gamma curve. This is called gamma correction, and I’m sure you’ve heard about it. You’ve probably seen it in the UI of your operating system. All you have to do is apply the inverse curve to your input. Here it’s X
raised to the power of 1/2.2. If we output this inverse curve to the monitor, the output will be our linear gradient, the thing we wanted in the first place.
Things are a little more complicated in practice. The gamma of certain monitors is actually closer to 2.5. The reason we use 2.2 is that in standard lighting conditions you have lights, and lamps and you have the window, which decreases the perceived contrast of your monitor. We correct for that by using the wrong gamma curve.
When we have an LCD screen, do we still need to worry about gamma correction? I’m sure you can guess the answer. The answer is yes, and the reason is because of our eyes. It turns out that the response of our eyes when there’s light coming into our eyes, is also nonlinear. Our visual system follows a mathematical law called the Steven’s Power Law. Here’s the equation. We have stimulus called I
here, and in our case, I
is the light. It’s the brightness of the light that enters your eye, and the subjective magnitude of sensation. The brightness you actually perceive inside your brain, is a function of the stimulus raised to a power. In Steven’s Power Law that power is denoted gamma (γ), and that’s why we talk about gamma curves. The gamma value depends on the type of stimulus. For light hitting the eye, under common lighting condition, it happens to be 0.5. Here I’ve done a bit of what I call graphics math. 0.5 is pretty much equal to one divided by 2.2. It’s actually one divided by two, but in graphics, it doesn’t matter. If it’s almost true, it is true.
Through a mix of sheer luck and a bit of engineering, it turns out that our CRT monitors have exactly the inverse response curve of our eyes, which makes things work well for everybody. Now, if we go back to our LCD displays, they do not have a gamma response curve. Some of them are more linear. Some of them have a more S shape. We have hardware in our LCD monitors that make them have that gamma response curve. You may be wondering why do we bother? One of the reasons was for compatibility with existing CRT monitors. It’s nice not to have to rewrite all the applications just because someone invented a new type of display. But the real reason we keep doing that is gamma encoding.
We talked about gamma correction. Today, it’s not about correction anymore. It’s about compression. The gamma curve is actually a compression scheme for images. You know about JPEG. You know the way PNG works. You can ZIP your images. It turns out, we also use gamma to compress our images. The reason for that is, if we go back to that curve from our eyes, what that curve shows is that our eyes are more sensitive to the dark tones and the mid-grays than the highlights. Which means that if we encode our images linearly, if you just have your values from zero to one without doing anything specific, you just write your for-loop, we are going to waste a lot of precision in our bits to encode the value.
Here is an example. We only have 32 values to encode the gradients. You can see that with a linear encoding scheme, we spend a lot of our bits to encode the highlights, where our eyes are not very sensitive. It turns out that if we were to encode our images with that scheme, which seems to be the standard way of doing it to the way we should be doing it, we would need about 12 bits per color channel to encode an image. If you’ve used image processing tools before, or if you used bitmaps in the past, you may have noticed that we use eight bits. The reason we can use eight bits is the gamma compression scheme.
When you encode an image with a gamma curve, you’re effectively redistributing the precision of the bits in the dark tones. Then we can encode everything the eye can see over only eight bits, and that’s why it’s a compression scheme. Compressing with gamma is not necessary if you have high precision formats. They can encode in linear space if they are precise enough. Some of the formats are now commonly used, especially in the movie industry or the gaming industry, in their production pipelines. For instance, there’s OpenEXR for HDR images. There’s PNG-16. Photoshop lets you store in 16 and 32 bits. If you have a camera, and your camera can take RAW images, then the RAW file will effectively be stored in linear over 16 bit per channel.
What that all means is that when you have a photo, the colors you see on the screen are not the colors the way they are stored on your hard drive. The same photo, just encoded with that gamma curve, looks brighter. That way, we have more precision in the darks. Again, you will never see the picture like that, but this is actually how it’s stored. This is how cameras create JPEGs. The only reason your pictures are so nice is because the hardware compensates for it.
Do all the math in linear space (39:38)
Now color pickers in applications, those are an interesting use case because you are picking a color by looking at the screen. You have this nice color wheel and say, “I want to pick that red.” You are seeing the color picker through the gamma curve of the display. Effectively you are choosing a color that has been gamma encoded or gamma compressed.
Unfortunately, some of the color pickers get that wrong. I won’t go into too much detail, but I noticed that the Mac OS X color picker is not doing it correctly when you choose a grayscale value using that little slider that you see at the bottom. But for all intents and purpose, you can assume that any color that you pick in your Photoshop, or Sketch, or whatever application you’re using has been gamma encoded and is not a linear color. It also means that every time you write a color value in your application, be it in your code or your XML resource files, it’s a gamma encoded value.
That is important because you are not in a linear space, so if you’re going to do math on those values, say to compute the average of two colors, the result is going to be wrong because you are not in a linear space. You are on that gamma curve. We’re going to look at an example with graphics.
Let’s imagine we have the linear curve. We have a black color. The value is zero. We have a white color. The value is one. Let’s imagine we want to find the average. What is the value that’s halfway between zero and one? It’s obviously 0.5. Unfortunately, that is wrong. Because if you do output that 0.5 value to the display, remember that the display is going to apply its own gamma curve, and so the value you’re going to get is 0.2, which is much, much darker than that mid-gray between zero and one. Those two values, black and white, live on the gamma curve. Before you do any processing on them, you must compensate for that gamma curve. We’re going to take a look at a code example, so you can see how to do it.
Here’s another example in which we have a bunch of gradients. You could imagine there are gradients that your app is generating. You’re creating your bitmaps, and you just have a simple for-loop, and you’re interpolating colors, maybe over time. Let’s say you’re animating from red to green. If you look at those gradients, they look pretty nice, but towards the middle, the colors get darker. We start from the top one with this bright red. We have this bright green. And yet at some point in the middle, we become dark. That doesn’t make any sense, because, between bright red and bright green, we should see only bright values. The reason is we are doing linear math in the wrong space. If we compensate for that, the gradients look a lot nicer. You can see now in the middle between red and blue; we go through a bright purple, and not a dark one anymore, and something between green and blue. The takeaway is do all your math in linear space. Doing that is fairly easy.
// Gamma encoded colors
int color1 = getColor1();
int color2 = getColor2();
// Extract red and convert to linear
float r1 = ((color2 >> 16) & 0xff) / 255.0f;
r1 = (float) Math.pow(r1, 2.2);
float r2 = // …
// Do the math and gamma-encode
float r = r1 * 0.5f + r2 * 0.5f;
r = (float) Math.pow(r, 1.0 / 2.2);
Here’s a piece of code. We have a couple of colors. Usually, on Android, they’re stored as int
s. The first thing we’re going to do is extract the red channel of both colors. There’s a bit of shifting and masking. We want to move to a float value between zero and one. So far, it’s pretty easy. Now we have our area one variable, and all we have to do is apply the gamma curve. Remember that that color is encoded, gamma compressed. We just need to apply the 2.2 gamma curve to bring it back into linear space. We do that for the second color. Then we can do our linear math. Here we’re just finding the average of the two colors. When we have our results, we re-gamma encode. That’s all you have to do. Anywhere in your applications where you do math on colors, do it this way. The results are going to be much nicer. One crucial point, do not gamma decode the alpha channel.
Alpha is supposed to be linear. Unfortunately, Photoshop does that wrong by default and gamma encodes the alpha channel. Make sure to go into the color settings in Photoshop and there is a color setting for gray. By default, it’s set to .20%. Choose sGray
, and that’s going to fix everything.
If you do 3D, you should use OpenGUI. OpenGUI has a lot of extensions to do that automatically. There is dedicated hardware in the GPUs to do it for you and it’s going to be free. You don’t even have to write anything in shaders. It’s just a matter of setting up your textures correctly.
Remember that this issue affects everything. We saw gradients. We saw color interpolation. But that also means that if you do a blur, or if you downscale an image, or if you upscale it, downscaling an image is effectively doing the average of every pixel with its neighbors. If you do that in the wrong space, you are making the image darker. Animations, we just saw an example. 3D lighting, you show them your OpenGUI, and you’re doing lighting. You should also do this in linear space. Otherwise, you’re going to have huge shifts, and the colors are going to look wrong.
If you ever want to know if the applications you are using is doing things correctly, you can use this pattern. It’s a series of stripes. They are black and white. In the middle, at the top, we have gray. The value is 128 over 255, and the bottom is 187. 128 is the average between black and white in gamma space, and the bottom one, 187, is the average of black and white in linear space. If you put that image in Photoshop or Chrome or whatever apps you want to try, you should downscale the image to 50% of its size. The black and white stripes should become the color of the bottom square. If they turn into the top square, then that means the application’s doing it wrong.
Color spaces (45:38)
We saw the gamma curves, and they are fairly easy to implement, but things are actually more complicated than that. The big question is, what is a color? We all think in terms of RGB, and we’ve seen RGB everywhere in our code. Red is 25500, right? That’s a decision for users. The decision for developers is that a color is actually a tuple of numbers, inside a color model, and they are associated with a color space. RGB is a color model, and the tuple of number is the three values that are used to define R, G and B. CMYK, which is often used for printing, is also a color model. It has four values in the tuple.
We’re going to forget about everything but RGB. We’re just going to focus on RGB. The big question is, if I tell you that I have an RGB value of 100 for red, what kind of red are we talking about? It’s an important question, because if you look at the visible spectrum that we can perceive with our eyes and our brains, this is what it looks like. And color science is interesting.
The way we came up with this spectrum is, in the 1920s, a bunch of scientists took random people, as far as I know, and asked them to look at colors. They asked them if they could differentiate between the colors, and after asking enough people, they decided that this is what we can see. The problem is that we don’t have display. We don’t have hardware that can record or even display the entire visible spectrum. That’s why saying that you have a red of 100 is meaningless unless you know what slice of the visible spectrum we are talking about. That’s what color spaces are about.
You can see on the screen there are a few triangles overlayed on top of the visible spectrum, and those are fairly common color spaces. You might have seen them before or heard about them. There is sRGB. That is typically your laptop monitor or your desktop monitor. We have Adobe RGB and ProPhoto RGB that are commonly used by high-end cameras or image processing applications, like Adobe Lightroom. Those are much wider than sRGB. You can see in here that the red value for sRGB is not going to be in the same spot as the red value for the ProPhot RGB. It’s not going to be the same red as seen by your eyes.
What defines a color space is we have three primaries, a white point and, what interests us the most, conversion functions. The primaries define the vertices of those triangles. It’s simply the red, green and blue, and their location in the visible spectrum. The white point simply defines the neutral color. You might have seen displays that feel bluish or yellowish, and that’s because the white point is different than what you’re used to.
Color spaces are actually not in 2D. They exist in 3D, so the slices that you see here are the footprint of a color space at the minimum brightness, but if we vary the brightness over the third axis, you see that that’s what the color space looks like. It’s interesting to see it in 3D, because as I mentioned earlier, our eyes are more sensitive to the dark tones. You can see that in the 3D shapes. We have more data in the dark tones and less in the highlights.
Conversion functions. What are they? They are the equivalent of the gamma functions that we saw earlier. Remember the 2.2 exponents and the one divided by 2.2? Unfortunately, they are a little more complicated than that in practice. They have really complex names. Instead of talking about gamma curves, we talk about the optoelectronic conversion function or the OECF. That one is the equivalent of the gamma curve when raised to the power of one divided by 2.2. It is used to convert from the linear space to the gamma-compressed space. The inverse function is called the electro-optical conversion function, or EOCF. Every color space must have those two functions defined.
So which color space should you be using? The only one you can assume, especially on Android and on mobile phones, is sRGB. This is how every application on your desktop works by default unless you do something else. This is how basically the Web works. They are trying to fix it because we’re starting to see white gamma displays. Unless you know what you’re doing, unless you know otherwise, always assume that the colors you are using are in the sRGB space.
This is the real equation for the sRGB space. It’s a piecewise function. There’s a small linear step at the beginning, so it’s the X multiplied by 12.92. That one is important, because if you want the highest quality possible for your conversions, remember that I said that our eyes are very sensitive to the dark tones, and this is why we have that linear function at the beginning, in the very, very, very dark tones. The second part of the function that looks complicated can be approximated with the 2.2 power expression. It’s the exact inverse function.
float OECF_sRGB(float linear) {
float low = linear * 12.92f;
float high = (pow(linear, 1.0f / 2.4f) * 1.055f) - 0.055f;
return linear <= 0.0031308f ? low : high;
}
This is the Java implementation of the OECF functions. Instead of raising to the power of one divided by 2.2, you have to write a new statement. The hardware that we have today is very efficient, but that can be pretty slow if you are going to process a really large image or if you just want things to go fast. There are various ways you can optimize this conversion. The first one is to use lookup tables. You can precompute. For instance, if you have sRGB, eight bits, you have 256 values. You can precompute a table that applies the gamma function. When you want to decode, you have to use 16 bits of precision because we are on this gamma encoded space. Or you can start applying graphics math, as I mentioned before, so the big, complicated function can be approximated with a 2.2 power that we saw earlier. Or if you want to go even further and optimize even more, X to the power of 2.2 is almost x-square. You can use x-square instead and use the square root. Don’t worry about doing funky math.
This is a comparison between the different approximations of the conversion function for sRGB. The correct one is the blue one. It’s almost the same as the 2.2 gamma curve. If we were to zoom in at the very beginning around the origin of the axis, we would see there’s a bigger difference because of the linear part of the function in sRGB. You can see that the square root approximation is pretty far off, but it’s still going to look fairly good on your screen.
Now to make things even more complicated, Android TVs work a little differently. They don’t use sRGB. They use something called Rec. 709 for HDTV. So for 720p and 1080p content, the standard is Rec. 709. Rec. 709 uses the same primaries and white point as sRGB, so the colors will actually be the same. The only difference is the conversion functions. They’re also fairly complicated, but an approximation is a gamma curve of 2.4. If you do write applications for Android TV, and you are doing any kind of image processing or you’re interpolating colors, you might want to go the extra mile and use a slightly different gamma curve for your computations to make your colors look even nicer on TVs.
UltraHD TV, 4k displays, and 8k displays use another color space called Rec. 2020, and this one has a similar conversion functions as the Rec. 709. It’s fairly easy for you to do, but the color space is very different. Here’s a comparison between the color spaces. Here on screen, you see sRGB and Rec. 709. The only difference is the gamma curve. That’s why one of them looks rotated, compared to the other one. The giant triangle here is the Rec. 2020. It’s a really, really large space. That means we need to do more than just apply your gamma curve to use it properly. In fact, it’s so large that the mathematical application they used to draw the diagram just went bonkers in the bottom right. That was Mathematica saying, “Math is hard.” Something important is that the Rec. 2020 color space, all the colors, are inside the visible spectrum.
Conclusion (54:38)
So I said earlier that you are doing it wrong. The good news is that Android is doing it wrong. Everywhere. All over the place. I could give you excuses like we started with really bad devices, and they had very shitty CPUs, and we couldn’t afford to do all those conversions.
So where are we doing it wrong? - Gradients. They are wrong on Android. - Animations. We fixed them in N, so they are less wrong than they used to be. - Resizing bitmaps. It’s wrong. - Blending is wrong. - Anti-aliasing is wrong and everywhere else.
I hope that someday I will be able to go all over the code and fix all the blending equations that we have in the bitmaps resizing and make them look better.
So let’s recap. What should you do? You have an input color. Assume it’s sRGB. Apply the inverse conversion function; X raised to the power of 2.2. You end up with a linear input. Then you can do your math. And when you’re done with your math, you gamma encode. You compress again in the gamma space, and you can send that to the display and everything looks nice.
Receive news and updates from Realm straight to your inbox