Can You Read Your Tests? Clean and Useful Android Testing, with JUnit and Spock!

Do you have unit tests? How readable are they? How much time do you spend maintaining them? Brittle and time-consuming tests lead to fear, uncertainty, and distrust, and that leads to an eventual disillusionment and abandonment.

In this 360AnDev talk, we will go over tools and techniques for writing tests that are a pleasure to read, easy to maintain, give you maximum return on the time you invest in them, and will prove their worth time and time again.

We’ll take a look at some example tests that are presenting some common issues, such as:

  • Readability
  • Brittleness (breaking every time the implementation is changed)
  • Non-obvious cause + solution upon test failure
  • Flakiness / unreliability
  • Slow execution

We’ll improve these tests in plain JUnit, move on to Hamcrest and AssertJ, and then show how Spock can take us even further to tests that are beautiful, easily maintainable, and incredibly useful. Stop wasting time on bad tests!

This talk aims to level-up every Android developer’s testing know-how and arm them with the tools required to write effective and maintainable tests.


Introduction (0:00)

This incredibly long title is all about cleaning up unit tests, with a little bit of Spock towards the end. But it’s mostly about just writing cleaner unit tests. There’s a lot of code here, and there’s also a lot of really bad tests. I’ve taken various examples from multiple locations such as public repos and anonymized some of them slightly.

Our First Example (0:31)

This is an enormous block of code, and this is supposedly a single test:

@Test
public void sharedPreferencesHelper_SaveAndReadPersonalInformation() {
    // Save the personal information to SharedPreferences
    boolean success = mMockSharedPreferencesHelper.savePersonalInfo(mSharedPreferenceEntry);

    assertThat("Check that SharedPreferenceEntry.save... return true",
            success, is(true));

    SharedPreferencEntry savedSharedPreferenceEntry =
            mMockSharedPreferencesHelper.getPersonalInfo();

    //Make sure both written and retrieved personal information are equal.
    assertThat("Checking that SharedPreferenceEntry.name has been persisted and read correctly",
            mSharedPreferenceEntry.getName(),
            is(equalTo(savedSharedPrefereceEntry.getName())));
    assertThat("Checking that SharedPreferenceEntry.dateOfBirth has been persisted and read "
            + "correctly",
            mSharedPreferenceEntry.getDateOfBirth(),
            is(equalTo(savedSharedPreferenceEntry.getDateOfBirth())));
    assertThat("Checking that SharedPreferenceEntry.email has been persisted and read "
            + "correctly",
            mSharedPreferenceEntry.getEmail(),
            is(equalTo(savedSharedPreferenceEntry.getEmail())));
}

Don’t try and parse too much of it. Your head will explode. This has all sorts of issues:

  1. There are comments all over it, and the comments are telling us that the code is not readable.
  2. We’ve got a wall of code here. It’s a huge block.
  3. We’re testing too many things. In fact, you can tell that because there’s an “and” in the test name, which is a dead giveaway.
  4. And we’ve got failure messages in here because the assertions are so unclear that we need to have these extra strings to tell us what actually failed, and they’re adding noise when you want to read the test.

So this is one of the methods the test is actually testing:

public boolean savePersonalInfo(SharedPreferenceEntry shareddPreferenceEntry) {
    // Start a SharedPreferences transaction.
    SharedPreferences.Editor editor = mSharedPreferences.edit();
    editor.putString(KEY_NAME, sharedPreferenceEntry.getName());
    editor.putLong(KEY_DOB, sharedPreferenceEntry.getDateOfBirth().getTimeInMillis());
    editor.putString(KEY_EMAIL, sharedPreferenceEntry.getName());

    // Commit changes to SharedPreferences.
    return editor.commit();
}

Or rather this is almost the method as it should be, but check out the problem with this which I’ve introduced:

editor.putString(KEY_EMAIL, sharedPreferenceEntry.getName());

If I’d copied and pasted one of these lines to add a new item, and then forget to change one part of it, it’s an easy mistake to make. But it’s okay because we have tests! That’s what they are for.

So we run the tests, and they pass.

Why? What’s going on?

At this point, I think it’s important to say there’re bad tests because they’re just unreadable, or because you can’t easily tell why it failed, but if the test doesn’t even fail, you’d be better off just deleting that test. It is giving you a false sense of confidence. So at this point, this test is in that camp.

That is a test that we’d be better off not having. So we’re going to see if we can turn it into something that we would actually like to have.

Firstly, we’re going to have a look at mMockSharedPreferencesHelper. That appears to be what we’re testing, although the weird thing is, it’s also being mocked at the same time. That’s kinda strange. Let’s take a look at where the variable gets initialized:

@Before
public void initMocks() {
    ...
    // Create a mocked SharedPreferences.
    mMockSharedPreferencesHelper = createMockSharedPreference();
    ...
}

From here we call createMockSharedPreference():

private SharedPreferencesHelper createMockSharedPreference() {
    // Mocking reading the SharedPreferences as if mMockSharedPreferences was previously
    // written correctly
    when(mMockSharedPreferences.getString(eq(SharedPreferencesHelper.KEY_NAME), anyString()))
            .thenReturn(mSharedPreferenceEntry.getName());
    when(mMockSharedPreferences.getString(eq(SharedPreferencesHelper.KEY_EMAIL), anyString()))
            .thenReturn(mSharedPreferenceEntry.getEmail());
    when(mMockSharedPreferences.getString(eq(SharedPreferencesHelper.KEY_DOB), anyLong()))
            .thenReturn(mSharedPreferenceEntry.getDateOfBirth().getTimeInMillis());

    // Mocking a successful commit.
    when(mMockEditor.commit()).thenReturn(mMockEditor);

    // Return the MockEditor when requesting it.
    when(mMockSharedPreferences.edit()).thenReturn(mMockEditor);
    return new SharedPreferencesHelper(mMockSharedPreferences);
}

It’s got quite a lot of setup. That again is a sign there’s something quite wrong here because there is a lot of fakery going on with this, too much really, for a simple mock. And when you end up with this much faking going on, you’re probably better off with a fake. But it’s even more troubling because in this sense we are testing this and we’re setting up fake behavior on it as well, which is a really strange conflict of interest.

There’s an interesting comment right at the top here which says that it will mock the reading as if it was written correctly. And that is exactly what we’ve just seen. It said that you could save and then read, and it just passed as if we’d written it correctly when we didn’t.

As I said, we’re doing two things here. So an obvious step, we’ve got Save, and we’ve got Read. If we split those into two separate tests, we are already 100% better off:

@Test
public void savesPersonalInformation() {
    // Given
    sharedPrefsSaveWillSucceed();
    // When
    boolean saveSuccess = sut.savePersonalInfo(mSharedPreferenceEntry);
    // Then
    verify(mMockEditor).putString(KEY_NAME, mSharedPreferenceEntry.getName());
    verify(mMockEditor).putString(KEY_Email, mSharedPreferenceEntry.getEmail());
    verify(mMockEditor).putLong(KEY_DOB, mSharedPreferenceEntry.getDateOfBirth().getTimeInMillis());
    verify(mMockEditor).commit();
    assertThat(saveSuccess, is(true));
}

@Test
public void readsPersonalInformation() {
    // Given
    givenSharedPrefsContains(mSharedPreferenceEntry);
    // When
    SharedPreferenceEntry retrieveInfo = sut.getPersonalInfo();
    // Then
    assertThat(retrievedInfo.getName(), is(equalTo(mSharedPreferenceEntry.getName())));
    assertThat(retrievedInfo.getDateOfBirth(), is(equalTo(mSharedPreferenceEntry.getDateOfBirth())));
    assertThat(retrievedInfo.getEmail(), is(equalTo(mSharedPreferenceEntry.getEmail()));
}

Now that’s more readable, and it’s testing one thing in each test. Although we’ve got a lot of assertions, or verifications, it’s essentially testing one thing in each test.

We run them and great; we have a failing test. So there we go. We’re done, right?

Yes, okay, this meets the base level, but we can do a lot more than this. Let’s just focus on savesPersonalInformation():

@Test
public void savesPersonalInformation() {
    // Given
    sharedPrefsSaveWillSucceed();
    // When
    boolean saveSuccess = sut.savePersonalInfo(mSharedPreferenceEntry);
    // Then
    verify(mMockEditor).putString(KEY_NAME, mSharedPreferenceEntry.getName());
    verify(mMockEditor).putString(KEY_Email, mSharedPreferenceEntry.getEmail());
    verify(mMockEditor).putLong(KEY_DOB, mSharedPreferenceEntry.getDateOfBirth().getTimeInMillis());
    verify(mMockEditor).commit();
    assertThat(saveSuccess, is(true));
}

We’ve got a whole thing here where we’re checking that the method will return a success value if this happens. That is separate from the values actually being written. So we can pull it out into a different test:

@Test
public void savesPersonalInformation() {
    // Given
    sharedPrefsSaveWillSucceed();
    // When
    sut.savePersonalInfo(mSharedPreferenceEntry);
    // Then
    verify(mMockEditor).putString(KEY_NAME, mSharedPreferenceEntry.getName());
    verify(mMockEditor).putString(KEY_Email, mSharedPreferenceEntry.getEmail());
    verify(mMockEditor).putLong(KEY_DOB, mSharedPreferenceEntry.getDateOfBirth().getTimeInMillis());
    verify(mMockEditor).commit();
}

@Test
public void reportSuccessWhenSavingPersonalInfoSucceeds() {
    //Given
    sharedPrefsSaveWillSucceed();
    // When
    boolean saveSuccess = sut.savePersonalInfo(mSharedPreferenceEntry);
    // Then
    assertThat(saveSuccess, is(true));
}

If this one fails, we know that part is broken. If the other one fails, we know that part is broken. We want to know which bit is broken immediately when a test fails. And we want the test to fail for one reason.

So we have cut savesPersonalInformation() to a few lines. Okay, it’s not too bad. We can read that pretty easily, but we could go a little bit further. We could modify it to where we can check all of those values in one place, and it will read very nicely:

@Test
public void savesPersonalInformation() {
    // Given
    sharedPrefsSaveWillSucceed();
    // When
    sut.savePersonalInfo(mSharedPreferenceEntry);
    // Then
    assertThat(mMockEditor, savedAllThePersonalInfo());
}

And if we need the detail, we can very easily see what that’s checking. So that’s probably the only extra step we might want to take down.

Okay, so, clean tests. It’s this easy, right?

Second Example (5:59)

So I’ve seen this written as a clean test because it’s very easy to read:

public void givenNoCriteriaWhenFindingMessagesForCleaningThenResultsShouldBeEmpty()
        throws InterruptedException {

    givenNoCriteria();

    whenFindingMessagesForCleaning();

    thenResultShouldBeEmpty();
}

But the thing is, I can’t see what it’s testing. I can’t see how it’s testing it. I can’t see how it’s checking its assertions. I actually can’t see it doing anything; there are just three sentences there. And this is based on a real example. It’s clean.

It’s maybe a little too clean. It’s kinda been bleached. So we take the first two methods out and put back, basically, what they were doing: setting up some test data and calling the thing we’re actually looking to test. This is very crucial:

public void givenNoCriteriaWhenFindingMessagesForCleaningThenResultsShouldBeEmpty()
        throws InterruptedException {

    messagesInfoResult = messagesFinder.findMessagesForCleaning(NO_CRITERIA);

    thenResultShouldBeEmpty();
}

Then we take thenResultShouldBeEmpty(), which just had a simple assertion. Let’s put it back in, because that is actually designed to be readable:

public void givenNoCriteriaWhenFindingMessagesForCleaningThenResultsShouldBeEmpty()
        throws InterruptedException {
    messagesInfoResult = messagesFinder.findMessagesForCleaning(NO_CRITERIA);
    assertThat(messagesInfoResult).isEmpty();
}

So this is pretty straightforward. We can understand this. If we wanted to go maybe a little step further, we could merge this into one line because technically that’s a readable line: assert that findMessagesForCleaning with NO_CRITERIA, it returns us something empty. That’s easy; we can understand this:

public void givenNoCriteriaWhenFindingMessagesForCleaningThenResultsShouldBeEmpty()
        throws InterruptedException {
    assertThat(sut.findMessagesForCleaning(NO_CRITERIA)).isEmpty();
}

It’s pretty much plain English, which raises the question why we need a test name that tries to express all of that:

public void findsNoMessagesWhenNoCriteria()
        throws InterruptedException {
    assertThat(sut.findMessagesForCleaning(NO_CRITERIA)).isEmpty();
}

We don’t need a test name that is a paragraph. We can have a very simple little test name that tells us very simply what this is trying to say about the thing we’re testing. We don’t need all the criteria. We don’t need all the conditions and all the things we’re trying to check all in a test name, which should be something very readable. And it can be when the body of our test actually expresses this stuff.

Last Example (8:06)

Okay, so, one last example. Don’t be too relieved, because it is a big example:

public void givenUsersAndJobs_whenGettingJobsForMultipleUsers_thenWeGetJobsMappedToExternalUsers() {

    givenJobs(Arrays.asList(ANY_JOB, ONE_MORE_JOB), ANY_USER);
    givenJobs(Collections.singletonList(YET_ANOTHER_JOB), YET_ANOTHER_USER);

    TestSubscriber<Map.Entry<String, List<Job>>> testSubscriber = new TestSubscriber<>();
    List<String> someSystemUsers = Arrays.asList(ANY_USERNAME, YET_ANOTHER_USERNAME);

    serviceClient.getJobsForSystemUsers(someSystemUsers, ANY_START_DATE, ANY_END_DATE)
            .subscribeOn(Schedulers.immediate())
            .subscribe(testSubscriber);

    List<Map.Entry<String, List<Job>>> expectedJobs = Array.asList(
            (Map.Entry<String, List<Job>>)new AbstractMap.SimpleImmutableEntry<>(
                    ANY_USERNAME, Arrays.asList(ANY_JOB, ONE_MORE_JOB)
            ),
            new AbstractMap.SimpleImmutableEntry<>(
                    YET_ANOTHER_USERNAME, Collections.singletonList(YET_ANOTHER_JOB)
            )
    );

    tesSubscriber.assertReceivedOnNext(expectedJobs);
}

Speaking of long test names; my brain fails to parse this. I have to read it multiple times to try and understand what it’s trying to say.

Given, Then, When (8:16)

But I’ll get to that test name in a moment, because the test body, it’s not easy to read, and that’s why we’ve ended up with a test name that’s trying to make up for it. There’s a lot of things we can do to this test.

Let’s start with trying to find what’s actually being tested:

serviceClient.getJobsForSystemUsers(someSystemUsers, ANY_START_DATE, ANY_END_DATE)
        .subscribeOn(Schedulers.immediate())
        .subscribe(testSubscriber);

That is what we’re doing that we want to check the results of. I’m going to set up some fixtures before it. We want to check something after it so we will switch some lines around and we have:

public void givenUsersAndJobs_whenGettingJobsForMultipleUsers_thenWeGetJobsMappedToExternalUsers() {

    // Given
    givenJobs(Arrays.asList(ANY_JOB, ONE_MORE_JOB), ANY_USER);
    givenJobs(Collections.singletonList(YET_ANOTHER_JOB), YET_ANOTHER_USER);

    TestSubscriber<Map.Entry<String, List<Job>>> testSubscriber = new TestSubscriber<>();
    List<String> someSystemUsers = Arrays.asList(ANY_USERNAME, YET_ANOTHER_USERNAME);

    List<Map.Entry<String, List<Job>>> expectedJobs = Array.asList(
            (Map.Entry<String, List<Job>>)new AbstractMap.SimpleImmutableEntry<>(
                    ANY_USERNAME, Arrays.asList(ANY_JOB, ONE_MORE_JOB)
            ),
            new AbstractMap.SimpleImmutableEntry<>(
                    YET_ANOTHER_USERNAME, Collections.singletonList(YET_ANOTHER_JOB)
            )
    );

    // When
    serviceClient.getJobsForSystemUsers(someSystemUsers, ANY_START_DATE, ANY_END_DATE)
            .subscribeOn(Schedulers.immediate())
            .subscribe(testSubscriber);

    // Then
    testSubscriber.assertReceivedOnNext(expectedJobs);
}

And now we’ve got all our fixtures and in the “given, when, then” order like it’s being used in the actual title of the test. It’s a little bit clearer already.

sut.getJobsForSystemUsers(someSystemUsers, ANY_START_DATE, ANY_END_DATE)
        .subscribeOn(Schedulers.immediate())
        .subscribe(testSubscriber);

I tend to name the thing that I’m testing sut; that’s a personal preference of mine. It’s a common abbreviation, “System Under Test”. The only reason I do this, and this is very much a personal preference thing, I find that if I name that one in a unit test sut, then everything else can just be named very plainly if it’s a collaborator. I don’t need to try and give it a name like a mock or a fake something. Everything else that doesn’t say sut must be some sort of collaborator. It must be some double for a test. The thing that’s sut, that’s obviously the thing I’m testing in any one of my unit tests.

So, what next?

Testing RX (10:16)

TestSubscriber<Map.Entry<String, List<Job>>> testSubscriber = new TestSubscriber<>();

...

sut.getJobsForSystemUsers(someSystemUsers, ANY_START_DATE, ANY_END_DATE)
        .subscribeOn(Schedulers.immediate())
        .subscribe(testSubscriber);

...

testSubscriber.assertReceivedOnNext(expectedJobs);

Well, this is something you might’ve noticed, TestSubscriber. It’s testing an RX based class. That is a slightly more modern example, and it’s quite common. You see the TestSubscriber is pretty much just adding noise. We don’t care about there being the ability to have a test subscriber and what scheduler we’re subscribing on just because we’re in a test, and we want it to execute immediately. We don’t care about any of that. For the purpose of our test, it’s noise, and if we can get rid of it, then great.

Sprinkle in some AssertJ (10:56)

public class TestSubscriberAssert<T>
        extends AbstractObjectAssert<TestSubscriberAssert<T>, TestSubscriber<T>> {

    public static <T> TestSubscriberAssert<T> assertThat(TestSubscriber<T> subscriber) {
        return new TestSubscriberAssert<T>(subscriber);
    }

    public static <T> TestSubscriberAssert<T> assertThatASubscriberTo(Observable<T> observable) {
        TestSubscriber<T> subscriber = new TestSubscriber<>();
        observable
                .subscribeOn(Schedulers.immediate())
                .subscribe(subscriber);
        return new TestSubscriberAssert<T>(subscriber);

    }

    public TestSubscriberAssert(TestSubscriber<T> actual) {
        super(actual, TestSubscriberAssert.class);
    }

    public TestSubscriberAssert received(T... values) {
        actual.assertValues(values);
        return this;
    }
}

AssertJ is fantastic for this kind of stuff. There are other systems for doing this. You could use Hamcrest. You could use Fest or something, although AssertJ is essentially a more modern version of Fest so I don’t know why you would.

You can see we have a simple little static factory method called TestSubscriberAssert <T> assertThatASubscriberTo that will give us back an assertion object. And you can see we’ve placed in the scheduling and the subscription in there, so we can keep all of that out of the tests. Whenever you’re doing anything with RX, you’re going to want to use this code multiple times, so why write it in every single test and have it polluting with noise? And then we can have these simpler methods for checking things, similar to TestSubscriberAssert received.

So if we use this class and place it in our test we can simplify our code:

public void givenUsersAndJobs_whenGettingJobsForMultipleUsers_thenWeGetJobsMappedToExternalUsers() {

    // Given
    givenJobs(Arrays.asList(ANY_JOB, ONE_MORE_JOB), ANY_USER);
    givenJobs(Collections.singletonList(YET_ANOTHER_JOB), YET_ANOTHER_USER);

    List<String> someSystemUsers = Arrays.asList(ANY_USERNAME, YET_ANOTHER_USERNAME);

    List<Map.Entry<String, List<Job>>> expectedJobs = Array.asList(
            (Map.Entry<String, List<Job>>)new AbstractMap.SimpleImmutableEntry<>(
                    ANY_USERNAME, Arrays.asList(ANY_JOB, ONE_MORE_JOB)
            ),
            new AbstractMap.SimpleImmutableEntry<>(
                    YET_ANOTHER_USERNAME, Collections.singletonList(YET_ANOTHER_JOB)
            )
    );

    // When
    Observable<Map.Entry<String, List<Job>>> observable =
        sut.getJobsForSystemUsers(someSystemUsers, ANY_START_DATE, ANY_END_DATE);

    // Then
   assertThatASubscriberT(observable).received(expectedJobs);
}

So that’s a little bit smaller. But there is still a lot of confusion being created when dealing with the lists. The Arrays.asList and Collections.singletonList are just adding all sorts of noise, especially when mixing in all the generics. So we can get rid of all of that.

We can introduce simple little methods that just do exactly what those things did. So we can put in a method, aList, and all it contains in it is the arrays, the new list. We can also create aMapEntry, that would just create us a map entry. It cleans these methods up a little bit. The result would be:

public void givenUsersAndJobs_whenGettingJobsForMultipleUsers_thenWeGetJobsMappedToExternalUsers() {

    // Given
    givenJobs(aList(ANY_JOB, ONE_MORE_JOB), ANY_USER);
    givenJobs(aList(YET_ANOTHER_JOB), YET_ANOTHER_USER);

    List<String> someSystemUsers = aList(ANY_USERNAME, YET_ANOTHER_USERNAME);

    List<Map.Entry<String, List<Job>>> expectedJobs = aList(
            aMapEntry(ANY_USERNAME, aList(ANY_JOB, ONE_MORE_JOB)),
            aMapEntry(YET_ANOTHER_USERNAME, aList(YET_ANOTHER_JOB))
    );

    // When
    Observable<Map.Entry<String, List<Job>>> observable =
            sut.getJobsForSystemUsers(someSystemUsers, ANY_START_DATE, ANY_END_DATE);

    // Then
    assertThatASubscriberT(observable).received(expectedJobs);
}

That is a bit more readable. We could even simplify this method even more by placing the expectedJobs list, and the getJobsForSystemUsers method call into a single line because it’s still quite readable:

public void givenUsersAndJobs_whenGettingJobsForMultipleUsers_thenWeGetJobsMappedToExternalUsers() {

    // Given
    givenJobs(aList(ANY_JOB, ONE_MORE_JOB), ANY_USER);
    givenJobs(aList(YET_ANOTHER_JOB), YET_ANOTHER_USER);

    List<String> someSystemUsers = aList(ANY_USERNAME, YET_ANOTHER_USERNAME);

    // When / Then
    assertThatASubscriberTo(
            sut.getJobsForSystemUsers(someSystemUsers, ANY_START_DATE, ANY_END_DATE)
    ).received(aList(
            aMapEntry(ANY_USERNAME, aList(ANY_JOB, ONE_MORE_JOB)),
            aMapEntry(YET_ANOTHER_USERNAME, aList(YET_ANOTHER_JOB))
    ));
}

For this part of the process we are focusing on what the method looks like, so let’s take a closer look at the methods up top:

givenJobs(aList(ANY_JOB, ONE_MORE_JOB), ANY_USER);
givenJobs(aList(YET_ANOTHER_JOB), YET_ANOTHER_USER);

I don’t really understand these. givenJobs, there’s a list and givenJobs, another list. And it’s weird because actually what it seems to be doing is given jobs for this user and jobs for this other user, but the user, the most important part here, is just tacked on the end. So I would just switch the parameters order around. I would also recommend var args here so we can get rid of a little bit more noise. Here’s our result:

givenJobsForUser(ANY_USER, ANY_JOB, ONE_MORE_JOB);
givenJobsForUser(YET_ANOTHER_JOB, YET_ANOTHER_USER);

Okay, great. It’s a little bit clearer, but one of the major problems I think with this test is, I can’t really see the sense in what it’s testing. So although superficially, we’ve cleaned it up a lot, you can’t understand how these fixtures work, and why parts work how they do.

I’m using, presumably, some sort of user object, and then I’m using a username. Does the ANY_USERNAME correspond to ANY_USER? Possibly. Their names are kinda similar, but I don’t know. I can’t really see that it does. If I look elsewhere in this test class, I can find a whole bunch of constants at the top where all these users have been defined:

private static final User ANY_USER = givenAUser(ANY_PERSON_ID, ANY_SYSTEM_USERNAME);
private static final User ANY_OTHER_USER = givenAUser(13, ANY_OTHER_SYSTEM_USERNAME);
private static final User YET_ANOTHER_USER = givenAUser(74, YET_ANOTHER_SYSTEM_USERNAME);

I can find some of the answers in setUp where it sets up some people and it sets up some mapping between usernames, but these are all part of what is essentially missing from our test.

public void setUp() throws Exception {
    initMocks(this);

    givenPersons();

    setUsernameConversion(ANY_SYSTEM_USERNAME, ANY_EXTERNAL_USERNAME);
    setUsernameConversion(ANY_OTHER_SYSTEM_USERNAME, ANY_OTHER_EXTERNAL_USERNAME);
    setUsernameConversion(YET_ANOTHER_SYSTEM_USERNAME, YET_ANOTHER_EXTERNAL_USERNAME);

    ...
}

private void setUsernameConversion(String systemUsername, String externalUsername) {
    when(mockExternalToSystemUserConverter.getExternalUser(systemUsername))
            .thenReturn(externalUsername);
    when(mockExternalToSystemUserConverter.getSystemUser(externalUsername))
            .thenReturn(systemUsername);
}

It’s been removed because someone was trying to remove duplication, but they’ve also removed parts of the test that show how it really works.

Think of it like a sentence or a paragraph of text that you’ve taken out all of the words that would link the sentences together. You can’t understand the context anymore. If we put them back in, we’d end up with something more like this:

@Test
public void givenUsersAndJobs_whenGettingJobsForMultipleUsers_thenWeGetJobsMappedToExternalUsers() {
    //Given
    List<Job> jobsForUser1 = aList(aJobForUser(USER_1, "job1"), aJobForUser(USER_2, "job2"));
    List<Job> jobsForUser2 = aList(aJobForUser(USER_2, "job3"), aJobForUser(USER_2, "job4"));

    given(jobService)
        .hasJobsForUser(USER_1, jobsForUser1);
        .hasJobsForUser(USER_2, jobsForUser2);

    given(usernameMappingService)
        .hasMapping(USER_1_EXTERNAL_USERNAME, USER_1.getUsername());
        .hasMapping(USER_2_EXTERNAL_USERNAME, USER_2.getUsername());

    List<String> externalUsernames = aList(USER_1_EXTERNAL_USERNAME, USER_2_EXTERNAL_USERNAME);

    //When / Then
    assertThatASubscriberTo(
            sut.getJobsForSystemUsers(externalUsernames, ANY_START_DATE, ANY_END_DATE)
    ).received(aList(
            aMapEntry(ANY_USERNAME, jobsForUser1),
            aMapEntry(YET_ANOTHER_USERNAME, jobsForUser2)
    ));
}

And at this point, we can actually understand what it was that this was trying to verify. It might actually be trying to verify too much all at once, but we can read this.The crucial thing to understand here is that, once we can read it and we can understand it.

Let’s focus our attention on the name of the test. That is way too much. I can’t read it. We need a simple name. What does this thing do? It finds jobs for users by external usernames. Okay, and if this test fails, then I know that it no longer can find jobs for users by external usernames. So let’s change the name from:

givenUsersAndJobs_whenGettingJobsForMultipleUsers_thenWeGetJobsMappedToExternalUsers()

to:

findJobsForUsersByExternalUsernames()

That is a simple little name. It would make sense immediately if I saw it was failing.

Common Test Code Smells (17:46)

So, having gone through all of those, some common test code smells, and I am a contract and freelance guy, so I tend to move around a bit, and I see a lot of these things.

  • Comments: if you see comments in your tests, and this does include some testing frameworks I’m not so keen on that tend to use strings a lot in the descriptions of things, and it does tend to have the same effect. When you can write these plain strings for your given bit and then for your when bit, they aren’t actually executable code. They’re just a description of what the code does. They do the same thing as comments in that they’re an excuse for not making the code expressive. And so it’s okay if they’re contributing to a report somehow, that’s kinda nice, but they mustn’t be an excuse for the code itself telling you what it’s doing. Comments will get out of sync, and that’s no less true for tests.
  • Big unreadable (often GWT) test names: if we get these big, unreadable test names, I can’t easily parse that, so it’s slowing down the amount of time that it’s going to take me to figure out what’s going on when the test fails.
  • “AND” in test name: if we see “and” in a test name, that’s gotta be one of the worst. You see this a lot, and that’s a big red flag. You get a huge block of assertions, and this one actually, I think this is generally acknowledged. If you see a lot of assertions in a test, that’s a bad sign. It’s probably testing too much, and even if it’s not and it’s just trying to test lots of facets of the same thing, then maybe it can be expressed in a more clean way.
  • Many huge assertions, custom messages: likewise, if there’re custom messages there, they’re just going to get in the way of you actually parsing the test when you’re trying to make sense of it.
  • Mocking non-collaborators: if you’re seeing mocking of things that are not collaborators, so mocking the thing you’re testing, that’s a big problem, and it’s probably going to cause you trouble. Also, I’ve seen people mocking value objects. Your value objects ideally would be, if they are really just tiny little value objects, they’d be immutable, and you can happily create those and use those in a test. You don’t need to mock them, and if you do, you’re probably going to introduce a lot of pain.
  • Mocks returning mocks, over-complicated faking: if you see mocks that are returning mocks, because this just tends to be, this is one of those rules of thumb, if you see a mock return a mock, probably your mocks are too complicated at this point. You’re probably doing too much. If you see a lot of overcomplicated setup of scenarios on your mocks, like lots and lots of when it does this, then it does this, and you see big blocks of this, maybe just write a fake instead. A simple fake that’s not using any kind of framework, because it’s going to be easier to read and understand and reuse.
  • Absence of clear connection between test clauses: finally, and this is one, as I said, you see these tests that sometimes read clearly, but you don’t really understand how it tests something, and that’s usually because the connection between the causes in the test has been lost.

Best Practices when Testing (21:00)

My main point is that test readability is something that I find is overlooked. I’ve found plenty of people will write really clean, expressive production code but there’s this kind of, something about the word “production” that bestows a certain honor upon that code that that must be clean. But the test code, that’s not what we’re shipping, that’s fine, it doesn’t need to be as good. It should be better! The tests have to be maintained just the same, and the tests are the spec for that code. So even if you couldn’t read the production code, you could read the spec, and it would tell you what it does. That’s what your tests are. So it should be more readable.

But it’s important when you’re trying to get rid of all the noise you don’t actually lose the nouns, the fixtures that you’re setting up. So if you’re using test data and you’re setting up that test data, and then you’re using it, and you’re expecting it, you need to be able to see how that thread runs through the test.

Most importantly, the test really, really does have to fail if the code is broken. You can even make this into a fun game. I used to work with a guy who was quite fond of this game. You do a ping pong pairing where I write a test, you write the implementation, you write test, I write the implementation.

Okay, well once you’ve written that implementation and we’ve got tests on it and everything, I’m going to try and find a way to break the implementation, and if the tests don’t fail, we’ve done something wrong. So it can be an interesting thought exercise little thing, and if you’re feeling a bit devious and you need to get some of that out, that’s a great, great way to do so. Pick somebody’s code and see if you can break their code but without breaking their tests. They’ll hate you a little bit for it, but they’ll also appreciate it.

And when it does fail, the messages need to be clear, so you need to check those messages. We have this whole red, green, refactor thing, but it’s not just red, it’s okay red, and what does the failure message tell me? Is that a nice, expressive, clean failure message? If I saw this failing, would I immediately know what the problem was? Can I make it clearer? And I don’t mean by putting in big strings because it’s not giving you a good enough message. Usually, if you use something like a matcher or AssertJ or something, it will give you cleaner messages. It will explain why things did not match and what the things were. And probably a test should be that small that it’s kinda obvious.

Spock (23:41)

So, this seamlessly brings me onto the Spock. This is a little bit old now, I have to admit, and it hasn’t received so much love lately, but the reason I’m so keen on it is that I haven’t seen anything better yet, and I’ve tried a few things. It’s got reasonable documentation, but that documentation is on Google Code which indicates it’s not exactly being maintained.

Spock Basics (24:24)

def "subscribers see published events at least once"() {
    when: publisher.send(event)
    then: (1.._) * subscriber.receive(event)
    where: event << ["starter", "paused", "stopped"]
}

def "copes with misbehaving subscribers"() {
    given:
    firstSubscriber.receive(_) >> { throw new Exception() }

    when:
    publisher.send("event1")
    publisher.send("event2")

    then:
    1 * latterSubscriber.receive("event1")
    1 * latterSubscriber.receive("event2")
}

This is the kind of thing that you can write in the tests because it’s based on Groovy, which sounds horrible and scary, but we’re not building Android apps in Groovy here. This is what’s going to run on a JVM. So what we’re really talking about is Java but with the ability to use a bit of DSL. There’s some messing with the syntax tree where it can rearrange lines so you can write your tests in a more natural way.

Spock has the “given, when, then” structure baked into it, so it recognizes these as parts of your test. For instance, in the “then” clause, if you try and put something in there that’s trying to do the wrong thing, it will say, “Well that’s not really what I expected. “I expected something that evaluates that I can check, and that’s not what you’ve put here.” So it keeps you true on your structure of a test. But you have the flexibility; you can say “when.” You don’t have to start with a given. You could put “where” and you can substitute in values. This allows you to do your nice, easy mocking and stubbing along with various other actions, and it has very nice syntax for this because they’ve taken advantage of the DSL capabilities.

def "reads personal info"() {
    given:
    sharedPrefsContains(expectedEntry)

    when:
    def retrievedEntry = sharedPreferencesHelper.getPersonalInfo()

    then:
    expect retrievedEntry, matchesEntry(expectedEntry)
}

For example, if you are using Hamcrest, it’s got basic support for it where you can use matchers. And as I mentioned, because it’s Groovy, essentially DSL stuff and metaprogramming, you can rearrange things in ways that they just read more naturally when you know what it’s actually doing. So it just gets rid of a tiny little bit more of that noise.

def "reads personal info"() {
    given:
    sharedPrefsContains(expectedEntry)

    when:
    def retrievedEntry = sharedPreferencesHelper.getPersonalInfo()

    then:
    expect retrievedEntry matches expectedEntry // check(retrievedEntry).matches(expectedEntry)
}

You can also put extension methods onto things, which, especially with things like Kotlin now, is becoming very popular and very, very useful for testing. Also, a good reason why Kotlin is actually very nice for running tests.

when: flippable.flip()
then: flippable.front == old(flippable.back)
and: flippable.back == old(flippable.front)

So let’s say you have this. And this is something which, I haven’t found a way to do this crazy stuff in Kotlin, and I like tests written in Kotlin, but this comes from Groovy’s ability to manipulate the syntax tree within the language. But this also makes it hard for IDEs to deal with it historically, but it allows it to do crazy stuff like this old thing. So I can verify against previous return values. I can write it in a way that is how I would say this without saying, “Well I need to store the value. Then I need to do this. Magic.

Failure Messages in Spock (27:49)

This is one thing I absolutely love about Spock. The failure messages are so detailed. It will give you the expression that failed verification. And it’ll break down every last little bit of the thing. So it’ll tell you, “This is this object. “This value was this. This is false. It did not equal. And there were nine differences, and if you really care, it was a 62% similarity,” which is going a bit above and beyond, but this fantastic breakdown means that the moment you get the failure, you generally know exactly what happened, which is invaluable.

Features galore on Spock (28:37)

So it’s got spying. For instance, I’m verifying that this thing must’ve been called once.

def "uses spy"() {
    when: thing.doSomethingWith(collaborator)
    then: 1 * collaborator.callWeExpect()
}

It’s got stubbing. I can say, “Well if somebody calls this, return the number four”:

def "uses stub"() {
    given: numberGenerator.getValue() >> 4
    when: result = thing.doThingWith(numberGenerator)
    then: result == 42
}

It’s got wildcards with a nice, natural way of doing it, so you got these underscores for wildcards:

_.receive(event)    // receive is called on any mock object
0 * _._             // No more interactions
0 * _               // Same ^^^

It’s got ranges of things so that I could say, “Well this thing was called at least one time but up to,” “ who cares?” Or, “It was called at most two times”. It’s pretty flexible. It’s nice syntax for this:

3 * subscriber.receive(event)       // exactly 3 times
(1.._) * subscriber.receive(event)  // at least 1 time
(_..2) * subscriber.receive(event)  // at most 2 times

I wouldn’t recommend using this, but you can actually use regexes. For example, you could say, “No methods looking like this were called,” but it’s not easy to parse, so maybe not:

0 * _./set.*/(_)    // no setter is called on any mock

Faking in Spock (29:38)

It’s got all sorts of stuff for faking if you need to set up fakes and things. Again, if you’re going to be using a fake a lot, probably just create a proper fake anyway, but if you’re just using it once or twice, and it’s a little bit, then you can do all sorts of interesting stuff here:

pickySubscriber.receive(_) >> { event -> event.priority > 3 ? "ok" : "fail" }
badSubscriber.receive(_) >> { throw new InternalError("I'm a troublemaker!") }

subscriber.receive(_) >> ["ok", "fail", "ok"] >> { throw new InternalError() } >> "ok"

1 * subscriber.receive(!null)                   // arg must not be null1
1 * subscriber.receive(!unwantedEvent)          // arg must not be equal to unwantedEvent
1 * subscriber.receive( { it.priority >= 5 } )  // arg must have priority 5 or above

For instance, when you call this, you’re going to call it with a parameter. I’m going to capture that parameter and then I’m going to have a look at it and check that this thing, if it was above three, then I’m going to return this. Otherwise, I’m going to return this. In Java, you’d use Mockito for instance, and you’d have an ArgumentCaptor. There’re all sorts of noise that doesn’t have anything to do with what you’re actually trying to do. So, and in this case, I can just say, “Well when you call this, throw this.” I could even say, “When you call this the first three times you’re going to get this, this, and then this. Then the fourth time it’s going to throw an exception. Then the fifth time it’s going to return this.” And it’s so succinct to say these things. And again, it comes from having a language that allows you to use DSL, but also the fact that I have built in the mocking.

Data Driven Testing (30:43)

This is one of my favorite things. So I don’t know if anyone has ever written data driven tests in JUnit, but frankly, I would say you’re better off not doing so. They’re very, very unreadable. In fact, I would generally always prefer to write maybe a bunch of tests that just call another method with parameters. That would be much clearer in JUnit than writing a parameterized test, because they’re ugly. This one, on the other hand, I can write a data table into my test:

def "maximum of #a and #b is #result"() {
    expect:
        Math.max(a, b) == result

    where:
        a | b || result
        0 | 0 || 0
        0 | 9 || 9
        -2| 1 || 3
}

I can also put nice little hash things in here in the name, and because I’ve said “unroll” this is basically going to run this test for every single one of these rows in the table, and it will show each one as a separate test. And as you can see, it actually gives it the name with these two substituted, so if that fails in that specific case, I can see that it failed in that specific case. And again, there’s a whole failure breakdown again. So I can see everything that was involved, why it doesn’t match.

Spock Recap (32:04)

  • Groovy-based: as it is groovy based I would not recommend you go write your Android apps in Groovy. You can, but I think that’s a lot of pain. But for writing things on the JVM different rules apply, and in this case, you’re pretty much writing Java, but you’ve just got some of the capabilities that DSL and the metaprogramming that is added in.

  • Expressive DSL abilities

  • You’ve got magic, which is great.

  • IDE syntax highlighting + autocomplete is not the best… :( : the one downside to it being Groovy based is that the autocompletion or highlighting will just be a little off. Honestly, that is majorly inconvenient, and it’s the worst thing about it. But compared to all the things it gives you I’d say this is an acceptable trade-off. Especially if you consider all the things you can do in this and for the expressive tests you can write in this… I’m happy to deal with that.

  • Built-in expressive mocking: it has built-in, very expressive mocking, which is better than any other kind of mocking framework I’ve seen in Java, and I’ve used a good few of those. Because you’re in Groovy, again, you’ve just got this ability to write the mocking setup and verification and so on in a way that you just can’t do in Java.

  • Readable, unrolling, data-driven tests: and you’ve got these data driven tests, these unrolling, wonderful, magic, data driven tests which I absolutely love.

Q&A

Q: Spock and Espresso. Is it possible? JR: I have seen a couple of things that looked at doing this, such as people’s GitHub repos. And yes, it works. There’s a lot of syntactic trivia you can put around Espresso that actually makes it read more naturally anyway in Java. Unfortunately, it doesn’t come with that, but yes. So you can use Spock with that as well, but when you consider a lot of the things you’ll be verifying in Espresso, you would be verifying those using Espresso’s methods for looking through the view hierarchy, and so on, so the amount it gains you is less. There was one project that tried to do it with Robolectric, RoboSpock which was a bit crazy, and then there was actually one that got it working with Espresso properly, and it did work. But yeah, I don’t really use it myself. I mostly use this for unit tests.

Resources


Jon Reeve

Jon Reeve

A freelance mobile developer and amateur coffee fanatic with more than 7 years experience in Android (still got that G1!), and over 10 years previous in everything from JEE and Swing to C & C++. Previously worked in the automotive and broadcast industries, then at Shazam, and nowadays hopping all over the place looking for challenges. Passionate about clean code, tests and TDD, and taking things apart to see how they work.