An Artsy Testing Tour

Artsy has four iOS applications; all of them are open source, and all of them take different approaches to testing. Why? Because different testing techniques work better or worse in different circumstances. From try! Swift, Ash Furrow discusses the motivations behind the Artsy iOS team’s decisions, the struggles they encountered, and how they overcame those challenges to give you a better understanding of when and why testing is important to building fantastic software.


Testing at Artsy (0:00)

I’m Ash, an open-source developer at Artsy, a company in New York. We’re trying to make art as popular as music. Principally I work on iOS applications, which is why I’m here today.

At Artsy we’ve built four different iOS applications and each one of them takes a different approach to unit testing. We’ve learned a lot from our experience and trying things to see what works and what doesn’t, and what works better or worse in different situations.

We’re going to walk through the process of unit testing Artsy’s four applications. Now, while two of these applications use some Objective-C, at least, everything discussed here is applicable to Swift as well. Throughout the talk, I want you to keep in mind three concepts:

  1. Don’t worry about being perfect. Don’t worry about not getting 100% test coverage. Testing is intrinsically valuable, not just because you get unit tests, but there’s value in testing the activity itself.
  2. Break existing components into smaller components because smaller pieces are easier to test.
  3. For new applications you want to keep components as small as possible - like ridiculously small, and we’ll talk about how small that is.

Emergence Testing (1:36)

This is our first application, codename: Emergence. It’s our Apple TV app and it serves a very narrow focus. Users launch the application, they select a city, and they get to see what art shows are going on in that city, so, they might select Tokyo and see the different art shows going on.

Let’s take a look at its testing strategy:

¯\_(ツ)_/¯

Mm-hmm, no tests.

I admit that it is a bit unusual to start a presentation about unit testing with an application that has no tests, but I think it’s important for two reasons: First, sometimes you can’t test or you shouldn’t test. Second, I just want to get some credibility with the developers in the audience who think testing is not that important.

I understand that in our community testing is sometimes new, and it feels difficult and intimidating, and there are times that you can’t test, and that’s all okay.

I want to talk about the reasons why we didn’t test this application.

  • First, it was built very quickly, under a month. We had a very hard deadline because the Apple TV was launching and we had to be there on day one.
  • It was built by only one person working in isolation, my colleague Orta, in Manchester. It was just one person working throughout the day. And there’s minimal ongoing maintenance with no new features that we’re planning on adding. It really is just… it’s complete.
  • The app has very narrow focus, it does one thing and it does it well.

Whether or not to test, just like anything in software development, is a balance. Every team and every team member has to be aware of this balance and make their own decisions. Emergence is a very small application. I asked my colleague and he said he could rebuild it from scratch within two weeks. It needed to be implemented in a hurry. And we didn’t add tests because the time pressure outweighed the value of testing.

So, to recap, should you test? Probably. But not necessarily. You also shouldn’t feel bad if you can’t.

Energy Testing (4:00)

Next, I want to talk about an application called Energy. This is our app for art galleries. This is how an art gallery manages and shows their inventory to prospective art buyers. It’s actually our oldest application, it’s our first codebase. Originally, it was built with no tests, and they were added later on. That was a very conscious decision. Why did we decide to add tests?

Why add tests? (4:53)

Well, the codebase for Energy is significantly larger than Emergence, so there’s more that can break. The code that keeps Energy in sync with our API is very important. It’s critical code that must not break. We must not introduce bugs, so we tested it to make sure that we didn’t.

Most importantly, we wanted to reduce the bus factor. The bus factor is a measurement of how concentrated the knowledge is among individual team members. Think of it as if this person were hit by a bus, would the product fail? Would your business fail?

We added tests because we needed to record the institutional knowledge locked away in my colleague’s brain and tests were a good way to define how our application works. Unit tests are a form of documentation about how your app and how your app’s parts work. Good tests will define how the parts of your application behave. That means not testing the internals of your classes, but only the external behavior.

When I tell people to keep their classes very small, they often complain and say if they keep their classes small, then they’ll have more classes. Oh no. How terrible, all these small, composable, well-tested classes.

Dependency injection (6:19)

Energy has the best unit tests of any of our iOS applications. What makes its tests so great? We use dependency injection a lot. Dependency injection is the idea that objects don’t create the things they need to do their job, their dependencies. Instead, these are injected into the object.

You might give an object a Core Data managed object context instead of having it access a singleton. We use dependency injection for stubbed managed object context as well as for the network synchronization code. Energy uses in-memory Core Data managed object context that can be quickly and cheaply created, then destroyed for our unit tests. Part of what we can test is how an object modifies the Core Data stores, so we can create a managed object context, inject it into the object that we’re testing, and then the object does something, and then we can inspect the managed object context to see that the changes that had been performed on it are what we expect.

Visualizing with RSpec testing (7:38)

Our tests initially had three tests and they each do something. I don’t know what they do because it’s a bunch of code and I’d have to read the code in order to find out. It’s probably a bunch of repeated code, but I don’t know, it’s a mess, who knows what’s going on.

We’re going to apply RSpec-style testing instead. The term RSpec comes from the Ruby community. You can use libraries like Kiwi, Specta, or Quick for iOS. To use RSpec-style testing we’re going to look through our different tests and identify the setup that is common to all of them. Once we identify that setup, we can refactor it out into only one spot.

We’ve wiped the common setup code once it’s executed for each test. We put this common code in before each step that’s run before each test. In Energy, this is often creating an in-memory Core Data store, then each test can rely on the store having been created and the beauty of RSpec-style testing is that we can repeat this process and we can re-identify a common setup. We’ve identified common setup and we’re going to refactor.

Our two most inner tests can now rely on the inner and outter before each step being done. For example, if the outer before each step creates a managed object context, the inner step might add some test data to it. So, let’s look at what we started with. Three tests - What are they doing? How complex are they? We have no idea. We would have to read each test in order to find out and that takes time and it’s boring and you have better things to do with your time.

In contrast, the new tests are more readable. It’s less code to write and to read and, importantly, unit tests are untested code which means that they should be as obvious as possible. Visually, we can tell how many contexts we’re testing, how nested they are, and how complex each test is by its relative length. If we have too many nested contexts, then our objects are probably too complex.

Importantly, we can create inner and outer contexts and give them descriptive names. That makes identifying a failing test very easy. The error message that Xcode will give you will be something like “In a Core Data store with sample data, deleting objects failed.”

Energy recap (10:56)

To recap, you can add tests to help document your application and the parts of your application. You should test the behavior of your objects and not their internals. If the behavior is too complicated to be tested without touching the internals, then it’s too complicated to be in your codebase. In my opinion, each class would ideally have only one public function.

Dependency injection helps us write tests by relying on mocks for network access and Core Data stores. RSpec-style tests help us with more concise and expressive tests. They make our tests more obvious, which is important because again, unit tests are untested code.

Eigen Testing (11:50)

Next up, we have Eigen. This is our consumer application. This is the app that you should all install on your iPhones and iPads for exploring art.

We originally built it without any unit tests and we added them later. The application was a rush job to finish it quickly among three developers. We added tests because we had already done it in Energy and we saw the value of testing, we just didn’t have the time when we started.

Testing challenges (12:42)

Eigen is probably the most similar to a typical application, something you might be testing. It’s got lots of code, tangled together, written by many developers with lots of different coding styles; it was hard to start testing this application.

One of the biggest reasons it was hard was because it has network access distributed throughout the codebase, compared to Energy where the network access was in one place: the synchronization code. Making sure that we’re mocking all of our network responses remains a huge problem when we’re trying to test Eigen. It’s also an ongoing effort to add more tests, which is difficult because we’ve had many different developers work on this application and they all write code in slightly different styles.

Contrasted with Energy, which was worked on by relatively few developers and has a very uniform way to test things, multiple different coding styles means that we need to have different testing techniques, which adds a lot of cognitive overhead.

Testing multiple platforms (13:35)

Eigen started out just on the iPhone, but eventually became a universal application, which raised the question of how do we test on multiple platforms. There are different options, but the one that we went with is something called Shared Examples.

Shared Examples allow you to define tests that rely on a context that is given to them. The tests are run several times with the different contexts, in this case, we have two context: the tests should pass on the iPhone and the tests they should pass on the iPad.

When we were adding tests originally, we tried to test the biggest classes first, since we thought they were the most fragile and should be tested with the highest priority. But they were so big that in order to test them we had to test their internals, and now they’re very difficult to change because if we change any of the internals, then we have to update many tests.

Our general rule is that all new code added to the codebase has to be tested. If we have time while we’re modifying existing code, we make it a priority to add tests to that code as well. Fixing a bug always requires adding a test to prevent regressions, to prevent the bug from happening again.

Snapshot testing (15:19)

One of my favorite ways to test an iOS application is using snapshot tests. They rely on a really amazing library from Facebook, and here’s how they work:

You set up your view or your view controller and you give it some data and then you record a picture of what that view looks like and you save it in a .PNG that stays in your repository. Later on, when you’re re-running your tests, you do the same setup, you give your view the same data and you take another snapshot and you compare that snapshot, pixel for pixel, with your reference .PNG. If they’re different, then the test fails.

This is really useful. It has a downside in that your repository is going to get larger on disk because you’re going to be storing .PNGs, but it’s a great way to detect accidental or inadvertent changes to your user interface.

They’re also really helpful for when you’re reviewing pull requests. For example, you can see how an interface has been changed. Or you can see, visually, in the pull requests how a new interface looks. It makes reviewing new pull requests really easy.

Getting help (15:31)

Sometimes we get stuck and we don’t know how to do something so we talk to other iOS developers. Maybe we’ll write a blog post, or we’ll tweet a link to an issue or a pull request that we’re working on on GitHub. Having our applications be open-source really helps with this. If we get really stuck, we’ll usually pick some solution, even if it’s not an ideal one, but we’ll try to document how we would want to improve it, so if and when we have an opportunity to come back to it later on, we know what to do. Remember, nothing is ever complete, nothing is ever perfect, and that’s okay.

Eigen recap (17:26)

To recap, we started writing tests and we didn’t break things down and we’re still paying the costs for that. Now, we break existing classes into smaller ones in order to test them, and that makes the code easier to navigate and easier to test. If you’re just learning how to test, if this is new to you, I would recommend starting with the smallest classes first, so that you can get comfortable with unit testing.

Make a rule on your team that all new code has to be tested, and add tests to newly modified code when you can. We get stuck sometimes, and that’s okay. We ask for help because the iOS community is great and a very helpful place. Sometimes we’re forced to write code that we know isn’t ideal or perfect, but we do our best to document why we made the decisions we did and how we’d like to fix it in the future.

Eidolon Testing (18:33)

Let’s talk about how we would write an application with tests from the beginning. This is Eidolon, it’s an application for bidding on artworks at an art auction. We use in-house enterprise distribution to deploy onto iPads that are physically mounted in enclosures so people at art auctions can browse and bid on artworks.

The app was originally built with tests, but time pressure near the end of the project required us to cut corners. We’ve been paying off the technical debt ever since, but adding tests while we maintain the application has helped us pay that debt down.

Swift testing (19:20)

This was our first Swift application and we started building it while Swift 1.0 was still in beta. The language, the compiler, Xcode, they were all changing around us and unit testing helped validate our understanding of the tools and our understanding of the language. It also made sure that our application didn’t break as the tools and language evolved.

It’s funny because unit testing was one of the only things that felt familiar in our first Swift application. Testing helped us adhere to software engineering principles and helped us verify our understanding of how Swift worked and made sure that we were confident in the quality of our code.

Using Quick (20:10)

Quick is an RSpec-style testing library used with Swift and Objective-C. We used Quick while it was still under development in order to write good tests, and we provided feedback to the Quick team who were very helpful in answering our questions as we were both sort of developing our application and the testing library in sync.

Quick gave us an RSpec-style testing framework and Nimble gave us really nice matchers. What are matchers? That’s a good question. Let’s take a closer look to see what a test actually looks like.

A good test in Eidolon is short, I would say a good test in general is short. It has three steps: arrange, act, and assert.

  1. Arrangement is mostly done in the “before each” steps.
  2. We act by calling the method that we’re trying to test the behavior of.
  3. We then assert that the behavior of the class we’re testing is as we expect, and we call these expectations.

Sometimes they’re called assertions. Assertions, expectations are essentially the same thing. So, what does an expectation look like?

Nimble pattern matching (21:46)

That’s where Nimble comes in. Let’s take a look at what an XCTest assertion looks like:

XCTAssertEqual(1 + 1, 2, "...")

It is verbose. Now, let’s see what it might look like with a Nimble matcher:

expect(1 + 1).to( equal(2) )

That’s much better. It’s more concise and more expressive. But we can do even better. Using custom operator overloading, we can become even more concise.

expect(1 + 1) == 2

This is how I like to write tests. Instead of assertions we call these expectations, but remember, they’re essentially the same thing. Nimble has more than just matchers for equality; it has built-in matchers for strings, arrays, ranges, all types of things.

There are also asynchronous matchers so that you can very easily test the asynchronous code without using XCTests, sort of difficult to do. And also, you can write your own custom matchers that perform custom expectation handling, which is what we did with Nimble in the snapshot library that I mentioned earlier.

Eidolon recap (23:11)

To recap, how do we write effective tests? We use RSpec-style testing with Quick, we use matchers from Nimble to write expressive, short tests, we write custom matchers in order to keep our tests expressive, and we try to limit the number of expectations to one per test. The tests should be short, they should be obvious, and they should be plentiful.

Testing Review (23:40)

Let’s review the three concepts that I asked you to keep in mind at the beginning of the talk.

  1. I asked you to not worry about being perfect or complete because nothing is ever perfect. Nothing is ever complete. We saw that sometimes we have to write code that’s not ideal and that’s okay. Sometimes we get frustrated, we get stuck on a problem, and that’s normal. Asking for help is really useful because we’ll get it from the iOS community.

  2. Try to break existing components into smaller ones because they’re easier to test and easier to maintain. And we only test the public interface of a class, never the internals. The public interface, by the way, should be very small. One public function is my ideal. For new applications, we try to keep things small from the beginning. Clearly defined logic and coding styles are easier to test, but it’s not always possible.

  3. By keeping our classes small, we keep different coding style differences limited between class boundaries so that our classes are easier to test. And keeping classes small on Eidolon helped us minimize the technical debt as we were developing and helped us pay down that technical debt later.

All of our iOS applications are open-source and we’ve worked hard to make sure that running them is as easy as possible for open-source developers in the community. I’d love for you to go check them out and if you have any questions about how we’ve built our applications or how we’ve tested our applications, just open a GitHub issue.

Q&A (25:03)

Q: How would you explain the importance of testing to a skeptical teammate coming from server-side coding?

Ash: I tried to talk about some of the benefits of testing throughout here, documenting behavior of the application, making sure that critical pieces of code don’t break. I would say that testing on iOS applications is at least as important as the server side because deploying, say, a Rails application, doesn’t have to go through the App Store review, so if you ship a bug that could’ve been caught with a unit test, you’re probably going to be waiting a week while your customers have that bug.

Q: What’s your opinion on UI testing or integration tests?

Ash: I have been undecided on UI testing and integration testing, mainly because I was unimpressed by the tools Xcode had until recently when UI testing was added. To be honest, I haven’t explored it thoroughly. I find that snapshot testing is sufficient as long as you make your views and view controllers very thin. This way they don’t do a lot of stuff, they really just respond to user interaction and present data. And then, if you have all that logic and stuff in another class, then you can just test that class in isolation.

So you don’t have the end-to-end testing, but lots of things work without end-to-end testing. We sent a car to mars and we couldn’t end-to-end test that, we had to test each piece in isolation, so if it’s good enough for NASA, it’s good enough for us.

That’s just my opinion, though. I think other people have preferences around UI testing and if it works for your team and if it works for you, then go for it.

Q: Do you have any advice on getting into test-driven development and the thought process it requires?

Ash: Test-driven development is contentious. Some people really like it. Some people really don’t like it. I am sort of in the middle. I like to write code and I’m very impatient so I often just write the implementation and then test it afterward, even though I know I shouldn’t.

Having written the code, I’ll go to write the test and I’ll often find that I’ve written my class in a way that is difficult to test and so I will change it. Having done that over and over again, I’ve gotten better at writing my classes in a way that makes testing them easy and I think that experience of having gone through that is valuable in the same way that test-driven development is valuable.

I don’t think that it’s the same robustness of an approach, but it’s good enough and for our team, the balance and tradeoff, and for me is worth it not doing TDD.

As far as advice goes, I would say keep your classes small. If you have a hard time testing something, try splitting it into several pieces and then testing each individual piece and then using dependency injection to test how they work together. Lastly, just practice. With experience you’ll get better at it.


Ash Furrow

Ash Furrow

Ash Furrow is a Canadian iOS developer and author, currently working at Artsy. He has published four books, built multiple apps, and is a contributor to the open source community. He blogs about a range of topics, from interesting programming to explorations of analogue film photography.