Instrumentation Testing Robots

Libraries like Espresso allow UI tests to have stable interactions with your app, but without discipline these tests can become hard to manage and require frequent updating. In this talk Jake will cover how the so-called robot pattern allows you to create stable, readable, and maintainable tests with the aid of Kotlin’s language features.


Mr. Green Guy (0:00)

This topic is broadly applicable; whether or not you’re an Android developer, or if you are someone that works on desktop Java UIs, web clients, server APIs – anything that involves testing can take advantage of this pattern to great effect.

Android will be our horse that we ride through this pattern, but it could be applied anywhere. That pattern is the robot pattern.

Before I get into what a robot pattern is, I want to talk quickly about how testing works at a high-level. This is if you have a green QA team, and your green QA team is using a computer. They interact with it, whether it is a web app, desktop app, or a phone.

When you have this QA team, they’re running through tests for your service, your app, they’re really only interacting with one thing. It’s the view. It’s how your app is presenting itself; they’re doing these high level tasks to verify that they work.

However, behind the scenes there are all these other things as part of our architecture. The one I chose is the model view presenter. There are reasons behind those architectures, but those don’t matter to the green guy, because the green guy’s only interacting with the view.

Behavior Architecture (2:43)

The reason we have these architectures though is to save us work. Perhaps our backend changes. Maybe our database changes – very topical – from SQLite to Realm. Underneath us our model’s going to swap out and now we only have to update the presenter, and our view doesn’t have to change.

Or maybe the requirements of the application change such that you want to swap out the view. Now, again, you only need to update part of your presenter, and the model doesn’t really care about the view. We have this nice separation of concerns.

That gets us all other kinds of awesome stuff. We could stick a fake model behind the presenter, and stick a unit test in front of it and now we can test the business logic inside the presenter.

Maybe we flip that, and we just stick a unit test behind the view, and we verify that whenever the presenter is calling certain methods on the view, that the view reacts how we expect. We have this architecture which really provides these things. You can’t really accomplish that level of granularity and flexibility, whether it’s for testing, or even just swapping out these parts of these magic boxes without an architecture.

But again, our green guy, the guy that’s actually doing the acceptance or functional tests of your app, only cares that there’s a view there.

Testing Expectations (3:38)

In this exploration, we’re going to use this app I beautifully designed, which slightly reflects Square Cash, which is the product that I work on. A test for this app might be kind of like an acceptance test. For example a high level functional test might be just take 42 dollars, send it to a specific email address, and verify that the transfer was successful.

This is the high level of what we want to accomplish. This is what we want the test to validate. But there’s also the how. And the how is interpreted by our friendly little green guy.

He’s going to type the number 42 on the number pad, tap on the little input box, type in the email in the keyboard that pops up, press the send button, and wait until the screen transitions and he sees a checkbox.

But this how, this how is kind of interpreted in real-time by our green guy. Really, what he’s looking for is the what. He’s making sure that we can send money successfully.

Traditional Testing (4:42)

We’re all programmers here and we we can automate these things. That’s what we do in programming. Instead of having to force that green guy to do these mundane tasks every time we make a release, I’m going to write an automated test that will perform that function for us in a way that we no longer have to pay the green guy to do a mundane job.

A way that you might write this test is something like this:

PayScreen pay = (PaymentScreen) obtainScreen();

pay.amountView.setValue(42_00);
pay.recipientView.setText("foo@bar.com");
pay.sendView.click();

Thread.sleep(1000);

assertThat(obtainScreen()).isInstanceOf(SuccessScreen.class);

You say, give me the screen that’s being displayed. Grab the fields out of it. Shove some data in. Find the button the screen has. Click on it. Wait a bit, because this is asynchronous. And then, get me to the screen and make sure that it’s the success screen, the screen that’s showing our validation that our payment succeeded.

The problem is that what you’ve essentially done is taken your test and just shoved it into the view and really tightly coupled the two things together.

If we go back to wanting to swap these things out, if our view changes in any significant fashion, we’re going to have to change our test as well. You’re not throwing it out, but you’re refactoring it. Your test thus ended up being about how the test was being accomplished, not about what was being accomplished.

As a result, whenever you have to change it you have to reinterpret parts of what the test was doing in order to make sure that the behavior is the same on the other side.

But, if we go back to our green guy, you did a poor job of replacing his function. You’ve now tightly coupled code. Inevitably, business requirements, technologies, all this stuff changes and you’re going to have to go and change your test. Whereas if you swap the view on green guy here, he doesn’t care, he still just reinterprets the test every single time and applies it. He’s still going for the what we want to test, instead of the how.

What we need to do is replace him with our tests.

Improved Testing (7:15)

A way to achieve this is to start outside and dive in, acting like the mouse or a finger instead of digging into your app code and the views it has.

findViewWithText("4").click();
findViewWithText("2").click();
findViewWithHint("Recipient").setText("foo@bar.com");
findViewWithText("Send").click();

Thread.sleep(1000);

findViewWithText("Success!");

“I’m gonna find this view and click on it.” Eventually you encode all these things which represent the same test. Still sleeping. We find our view, and verify. This is the same thing. In the Android world, Espresso helps you do some of this:

onView(withText("4")).perform(click());
onView(withText("4")).perform(click());
onView(withHint("Recipient")).perform(typeText("foo@bar.com"));
onView(withText("4")).perform(click());

onView(withText("Success!")).check(visible());

It interacts with your app through the view hierarchy. But instead of doing it through the classes that you write, it behaves like your app is an opaque set of views. You instruct it to find certain views and perform actions on them, as if it was the user touching on the screen or typing on a keyboard.

When this came out everyone was ecstatic. It really simplified their tests. If you noticed, the Thread.sleep() disappeared, Espresso takes care of waiting for asynchronous actions for us.

You might feel happy, until this part of the presentation comes, and I say, well, we have our expectation of what we want to test. However, your test is still the how; you’re still encoding how the test has to be performed in order to have it run. If your view changes significantly, you’re going to have to go into every single test and update what views it’s looking for, what order these things are being operated on, which is tedious. It’s not what we want, since we’re programmers – we’re lazy; we want the most efficient thing that minimizes the amount of work that we have to do.

The reason we have the architecture of the view-model-presenter is that we really want to separate the the what from the how. The model is the what – the data in raw form from the network, from the database, from the file system, whatever. Our presenter’s the one that molds that into how that data gets pushed into the view.

On the flip side, we have our test that we’ve written with Espresso, or any of these tests that you write for whatever platform, and we shoved the two concepts together – the what and the how. This is the fundamental problem. This is the thing we need to fix.

Thus, I’m proposing that the robot pattern I’m about to show as a fantastic way to have the separation of concerns. Ultimately, the language features offered through Kotlin help make the tests very expressive, very terse, and resilient to these changes.

Robots: Separating the What From the How (9:50)

What is a robot? Ultimately, this is a pattern and it’s open for interpretation. I will show you two examples which are just ways to interpret this pattern. Again, this is kind of specific to Android, but you really can apply this to whatever platform as long as you just think about the separation and think about how you can accomplish it.

A robot is just a class that encodes the high level actions that we want to take on a view, or anything else. Again, this could be a server that’s exposing functionality through endpoints. The way that you interact with those endpoints you want to be very high-level and descriptive.

Then the kind of nitty gritty of how you actually talk to those endpoints, the serialization format that it uses, the parameters that are required. Those are the how. The how is what we’re going to encode in this robot.

Simple PaymentRobot (10:40)

For our glorious little app we have a few things that we can do on this payment screen. It’s aptly titled the PaymentRobot:

class PaymentRobot {
    PaymentRobot amount(long amount) { ... }
    PaymentRobot recipient(String recipient) { .. }
    ResultRobot send() { ... }
}

class ResultRobot { 
    ResultRobot isSuccess() { ... }
}

We can enter an amount. We don’t know how the amount gets entered. The only thing we know is that if we want to enter an amount on this screen, it requires some value.

We can enter a recipient. We don’t know where the recipient’s going on the screen, we just know that the screen allows us to enter a recipient field.

Then we can take an action. We can take an action on this screen, which is press the Send button. In our app, when you press the Send button the context in the app changes and we move to this results screen that will tell whether the transaction succeeded or failed. Thus, our Send action can actually return a different robot that knows how to interact with the screen that’s coming up.

We can verify whether or not that results screen is showing success. Ultimately, if the payment failed, some hypothetical assertion happening in here would fail and break your test.

And so there’s all kinds of crap that we can shove in here which is the how. It doesn’t matter what it is. It’s specific to your app, your framework, your platform, however you’re testing. The implementation of the robot is the how. We’ve taken these high-level intentions of how we want to interact with the app, and we’ve extracted them and encoded the how of those intentions are manifested in the app behind this robot.

Now, we have this sweet little API that we can use:

PaymentRobot payment = new PaymentRobot();

ResultRobot result = payment
    .amount(42_00)
    .recipient("foo@bar.com")
    .send();
    
result.isSuccess();

Just say, “Hey, when I start the app I know I’m on the payment screen, give me the payment robot.” We’re going to send a $42 payment to our Foo Bar friend here. This gives me back my robot for the next screen, and I’m going to assert that it was successful.

Here, we’ve encoded the what into our test within our builder, as opposed to the how. This test describes exactly what our original problem statement is, the script that the QA guy would follow.

The script doesn’t say press these buttons. It says, draft a $42 payment to Foo Bar here. Press Send. Verify success. Both of these are the what now. The big pile of implementation junk is the how of the robot.

Your test is declarative, terse, and your robots are a little crazy. But at least this data is only encoded once. The robot is written once, the tests are written many times.

One Robot, One Hundred Tests (14:15)

This is a test for the happy case. This is a test just to see if the app is working. The transaction goes through:

@Test public void singleFundingSourceSuccess {
    PaymentRobot payment = new PaymentRobot();
    
    ResultRobot result = payment
        .amount(42_00)
        .recipient("foo@bar.com")
        .send();
        
    result.isSuccess();
}

We also need to test for when the guy’s trying to send a million dollars and we’re not gonna let you send a million dollars.

@Test public void singleFundingSourceTooMuch {
    PaymentRobot payment = new PaymentRobot();
    
    ResultRobot result = payment
        .amount(1_000_000)
        .recipient("foo@bar.com")
        .send();
        
    result.isFailed();
}

Another scenario to test is if your account doesn’t have the required money. Maybe you only have 150 bones and are trying to send 1,000, it’s not gonna work. We need to validate,

@Test public void singleFundingSourceInsufficientFunds {
    PaymentRobot payment = new PaymentRobot();
    
    ResultRobot result = payment
        .amount(1_000_00)
        .recipient("foo@bar.com")
        .send();
        
    result.isFailed();
}

It’s essentially a combinatorial explosion of all these edge cases – these test cases. And there’s so many of these. We want these to represent the what – what is being tested. What is the behavior of the app, not how do we make the app do these behaviors or run through sending a payment.

So we write the robot once. It gets all the interaction logic and then we have these tests, which hopefully there are many because you’re all writing many tests. They are very terse, very declarative, very high-level.

When these things change, if it’s just the view that’s changing, just the kind of layout, or the way that you’re inputting data, then only your robot has to change. But if it’s a business function, the overall structure of the app – the value proposition of your app – if you’re rearranging screens, those are going to be be reflected in the test, not the robot.

Hopefully you can start seeing the parity, this is why I focused on the architecture up front. You can see that there’s a reason for those architectures. If we take the ideas from which they were created, and start applying them on the test side, our tests not only get more readable, but they ultimately get more stable and are more resilient to the inevitable change of the things under testing.

In our architecture, we have one robot per screen, and then our test count goes wild. We get that nice separation and concerns.

Kotlin Robots (16:30)

Our test is okay. It’s nice and declarative and terse. But Kotlin can do us a lot better. So, what would we do if we wanted to turn these robots into Kotlin robots?

We want to try and leverage the language features that Kotlin provides, while ultimately retaining the type safety aspect of it. We can do the easy thing, Command+Shift+Alt+K to straight port into Kotlin. That doesn’t buy us anything, we can go farther.

Generally, if you see a builder, things should instantly come to mind of how you can fix the builder in Kotlin. Let’s try and do that.

First, let’s replace the builder with more advanced primitives of the language. Let’s get rid of our builder return types here – Just hack ‘em off. Before we were dealing with object creation. We were creating this robot and calling all these methods. We’re going to switch to a little factory function.

fun payment(func: PaymentRobot.() -> Unit) = PaymentRobot.apply { func() }

class PaymentRobot {
    fun amount(amount: Long) { ... }
    fun recipient(recipient: String) { .. }
    fun send(): ResultRobot { ... }
}

class ResultRobot { 
    fun isSuccess() { ... }
}

By calling the func within the apply block, the function returns a value which is the robot itself, as opposed to void. This will allow us to chain methods nicely.

Now that we have this, what does this change our calling code into?

val result = payment {
    amount(4200)
    recipient("foo@bar.com")
}.send()

result.isSuccessful()

Well, we no longer need to call the constructor, we can call our static method. Kotlin allows us to pass the blend of block without parentheses or anything, so we just throw that up there. These higher level intention methods on the robot, we now can call without any qualifier because this is an extension function.

Inside that block, we are acting as if we’re inside the robot class. We can call them as if they’re sibling methods. When we call them out, it’s calling them out on the robot. Same with recipient. Next, I call the send() on the return value of that payment method.

This is why we did the apply trick. We still want the robot to be returned to us, so when we call payment with this block we get our original robot back. Then we call the send method which gives us the next robot that we can interact with, because this is changing screens and we want to move to a new robot.

What we can actually do is make a nice little visual chain so we don’t have to constantly have these local variables of different robots. We’re basically just taking the pattern that we just did with payment. That’s the act of taking in a block that behaves on a robot. We’re going to apply it to the send() method which returns a new robot.

fun send(func ResultRobot.() -> Unit): ResultRobot {
    // ...
    return ResultRobot().apply{ func() }
}

We basically just take in our extension function, and apply it to the robot that we otherwise were returning. We’re changing how the send method behaves. Now instead of returning a value, it takes in a lambda, and so we can just tuck our little success check up into a lambda, and now we no longer need the local variable at all.

payment {
    amount(4200)
    recipient("foo@bar.com")
}.send() {
    isSuccessful()
}

We call payment. We do everything we want on the payment robot that’s scoped to that robot. We call the send, that still returns the robot to us. We immediately call the send method and we pass in all the behaviors for the next screen.

If we wanted, we could be chaining these things down. But we can do a tiny bit better. There’s this crazy little language feature in five characters that we can shove before our send function called infix which basically turns it into a binary operator. Meaning that, it’s just a function that takes in two values and returns another value.

We’re going to abuse this to accomplish one little thing: take the period between the semi-colon and the send method, and just get it out of there.

payment {
    amount(4200)
    recipient("foo@bar.com")
} send() {
    isSuccessful()
}

Now we have this beautifully terse, extremely descriptive block which is describing the what that is being tested – not the how. How this test actually runs against the view is irrelevant to the test itself. The test only cares about kind of the business logic of what is being test, not the implementation of how your app is executing that test.

General Robot Strategy (23:37)

The general pattern here starts with an entry point, which is just the first screen you see, and then you have whatever actions you want to take on that view, or assertions. We tend to not do a lot of assertions, we want to push that into more unit tests.

We implicitly assert things by just walking through the app, and ensuring that it does what we expect, and that the things that we want to interact with are on the screen. isSuccess() is like an explicit assertion – you are asserting that something is being displayed on the screen.

Then our send() method is basically any time you’re transitioning between screens, which are also transitions between robots. Our screens are moving, and we need to move on to another robot and our fancy little infix function here gets that for us.

Descriptive Stack Traces (24:30)

Another cool thing – we kind of get some nice stack traces out of this that are very descriptive:

Exception in thread "main" java.Lang.AssertionError:
    Expected <Success!> but found <Failure!>
    
    at ResultRobot.isSuccess(ResultRobot.kt:18)
    at PaymentTest.singleFundingSourceSuccess.2.invoke(PaymentTest.kt:27)
    at PaymentRobot.send(PaymentRobot.kt:13)
    at PaymentTest.singleFundingSourceSuccess(PaymentTest.kt:8)

You don’t have to click on the test, see what line the failure was and kind of figure out what happened. This stack trace essentially becomes very descriptive. We know that we’re in the singleFundingSourceSuccess() test. We know the payment robot called send(). So, it’s pressing the Send button on the payment screen.

Then we know the results robot was asserting success. So you get a stack trace, which mimics the steps that were taken in order to get to in this case, the failure. You potentially don’t even have to go look at the test, unless you need the data that was entered. Your stack traces replicate the process through which you got to the failing screen, which is a nice side effect.

An Alternative Robot (25:25)

The other thing is I went through all the trouble of doing the infix, so you get the nice chaining. This works for apps that have very few branches in your logic of what’s under test. If you have an app in which every screen can shoot out to six other screens, depending on state or input, defining these transitions in every single robot can get a little bit tedious.

This is the other way to apply the patterns. You basically can remove the explicitness of changing screens from your test and make it more implicit. Let’s say when you’re sending your payment you’re going to try to send a lot of money, and we want to verify your identity to make sure you’re not a fraudster.

payment {
    amount(4200)
    recipient("foo@bar.com")
    send()
} 
birthday {
    date(1970,1,1)
    next()
}
ssn {
    value("123-45-6789")
    next()
}
result {
    isSuccessful()
}

So we ask you for your birthday and we ask you for your social security number. Instead of encoding those transitions in every single potential robot, we basically just kind of make it implicit through the fact that these things are showing up linearly in our test and therefore, need to be executed linearly by the code.

The way that you still get that safety of transition is you basically turn these functions into asserting something as they’re moving along:

fun payment(func PaymentRobot.() -> Unit) {
    onView(withText("0")).check(visible())
    return PaymentRobot().apply { func() }
}

If you’re going to do actions on the payment screen, before I give you the robot and actually allow your code to run, I’m just going to verify something about the payment screen to make sure that it’s on the screen.

We’re just validating a pre-condition that the test is where it expects to be in order to interact with the app. In this case, we would just look for labels on the screen, just to have a little validation. Sometimes it may not even be needed.

The Robot Pattern (27:55)

Lastly, I want to emphasize that we should think of this as a very broad pattern. You’re welcome to interpret it how you want in order to make sure that you extract maximum value. There’s no library here. This is a pure and simple pattern.

We all agree that architecture within the app leads to long-term maintainable code. However, we never really talk about what’s on the other side, which is your test architecture. Usually it’s an afterthought, something that you’re always trying to play catch up on.

However, if you actually take the time, which is not that much, and think about the architecture of your tests, keeping the same separation of concerns that app architectures give you in your tests, you will end up with higher quality, more maintainable tests that will be correct in the long-term and actually save you from writing code.


Jake Wharton

Jake Wharton

Jake Wharton is an Android developer at Square working on Square Cash. For the past 5 years he's been living with a severe allergy to boilerplate code and bad APIs. He speaks at conferences all around the world to educate more about this terrible plague that afflicts many developers.