Surviving a Legacy Code Apocalypse: Android Dev at Twitter Scale

As Android matures, codebases are getting bigger, & legacy code is all over the place. Working in that environment can easily be hell, unless you apply the right strategies to refactor this smelly code. In this talk from Øredev, Israel Ferrer covers his techniques for dealing with legacy code. Testing will take a key role, as it gives you the ability to make changes faster without breaking existing features.


Hi! I’m Israel Camacho, Android engineer for the past seven years, now working at Twitter. In this talk I will discuss techniques, strategies, and code examples on how to refactor and live with legacy code on Android.

What is Legacy Code? (0:42)

“Legacy code is simply code without tests.”

This quote is by Michael Feathers via Working Effectively with Legacy Code, a book I highly recommend. Although generalized, I don’t disagree with this statement. Without tests, you don’t know if your code is getting better or worse. Specifically, the purpose of this software is to fulfill requirements and be useful for customers. If we don’t have tests, we don’t know what behavior is happening outside of manual testing. Investing in manual testing is time taken away from making better features, products, and experiences.

For this reason, tests have been key in my career since I can always rely on them. Failed tests mean a broken behavior, whether by mistake or in order to create a new behavior.

Testing Pyramid (2:24)

Building a testing foundation upon unit tests, which are less flakey, cheaper to maintain, and fast. Unit tests test the unit, the method, the class, correct output for input. Functional testing is tests between classes and their interactions. Lastly, end-to-end tests test the whole use case – the whole flow from the UI to the backend. Tests help you avoid legacy code by ensuring that problems don’t happen.

Here’s an example of why we should care about tests. In a file called BadView.java with more than 4000 lines and 180 methods, there is no Javadoc nor any tests. As a new member of the team with this file, I’m tasked with adding a feature to 4000 complex lines. Its Cyclomatic Complexity Numbers are remarkably high, and I would either break the entire view trying to change it quickly or take months to figure it out correctly.

Legacy code makes us slow, which means it costs more money to the company. It also makes it difficult to work with, and a team of developers may lose members demotivated by it. In the end, maintaining legacy code costs time, money, and people.

Detecting Legacy Code: Fixing Code Smells (6:50)

Code smells are symptoms that something in the code is wrong. Although not scientific, they give you the feeling that something is not right and you might want to question your choices. Here are some common examples of code smells to avoid:

Avoid duplication of code (7:41)

Don’t repeat yourself, particularly when there’s distinct separation of responsibilities that can be made.

For example, in the Digits SDK that allows you to sign up and log in with your phone number, we had a bug with the theming support. The bug was related to ActionBarActivity and Activity, and to solve it quickly we made the bad choice of duplicating the code.

This will clearly become a problem when I make a change to one of them and forget to update the other one. To solve this issue, I decided to create a delegate and then that delegate was encapsulated in the Activity logic. Then, I just have to initiate that delegate in one Activity and the other, avoiding duplication of code.

Avoid long methods and parameter lists (10:27)

The problem with long method names and parameter lists is that you just don’t understand them. Even if you write them yourself, you’ll forget them after a week and won’t have anyone to blame. Keep it simple, keep it short. If a method name or list is too long it’s because it’s doing too many things. Break it up into shorter, more digestible method names and you’ll be much better off.

For example, you could break a method that is responsible for binding a lot of data into multiple methods that are called right after the other, cleaning up the parameter list and isolating each of them into smaller chunks.

Avoid large classes (13:10)

The same general idea applies to classes as well. Too much code, too many instance variables, too much complexity can create daunting classes. There is not a silver bullet but usually if the class is that big it’s because it’s doing more than one thing. Try to extract interfaces so you will understand if the class is doing too much, and why is it doing too much. You can do something similar to what I showed to you with the method, but with a class.

Avoid comments everywhere (13:42)

This is my favorite one. Sometimes I see methods with four lines of code and five comments – how is that even possible? The thing with comments is that you write faster code than comments. At some point you stop reading them or updating them since they don’t compile and you don’t care for it enough.

Don’t get me wrong, comments are good for pulling APIs and defining intention of methods. But if you feel that you have to add too many comments in your method, then maybe it’s too complex.

Instead of writing a comment, write a test case. Test case makes more sense, because I can compile them and I see the results immediately and clearly; they’re either red or green.

You can find more about these code smells in the book Improving the Design of Existing Code, there are many many more.

Automate All the Things! (15:30)

Anything that could be automated should be. Of course there are things that can’t be automated, but unlike people, automation can’t make mistakes.

When we develop at Twitter we have an automated process. In this team, Ty and I use Gerrit as our code review tool. As we submit code, the CI butler there Jenkins, runs all the C8 and testing jobs. Hopefully he likes my code and I’m not breaking any tests. He then verifies the change, and then Ty approves the changes and I can submit the code to our git repo that we can deploy.

We then use SonarQube which gets its figures from Jenkins and provides a nice dashboard with metrics such as lines of code, files, functions, units of coverage for each commit. Also, it shows the statuses of documentation, comments, and their own scale rating.

How to Clean Legacy Code (18:14)

Now that we know how to detect code smells, let’s see how we clean it. For cleaning, I really like to have a SOLID code. This is an acronym by Robert C. Martin:

  • S = Single responsibility
  • O = Open-closed principle
  • L = Liskov substitution
  • I = Interface segregation
  • D = Dependency inversion

Single responsibility (18:56)

A class should only have one reason to change – one goal to accomplish, and that’s all.

For instance, the following User class has the data to represent a user and a bunch of methods:

public class User {
  public String calculateAge() {...}
  public void save() {...}
  public void restore() {...}
  public String describeUser() {...}
}

Save and restore are used to persist the user, which is bad as that creates duplicate code when other classes need to save and restore from the database. This clarifies my single reason to exist.

A better solution would be to have a user class that has the knowledge of a user, and then a persistence class which actually saves and restores the object.

Open-closed principle (20:18)

Open for extension, closed for modification. Do whatever you need done, don’t think too much on whatever you will need in the future, and use abstraction interfaces rather than complete classes.

Liskov substitution (21:58)

Liskov substitution: derived classes must be substitutable for the base classes. The output should not violate the contract that you’re defining. For instance, if we have a sum operation, and we subclass that class and we change the sum operation, be sure to expect a sum and not do any other operation.

Interface segregation (22:52)

Interface segregation means make interfaces really thin. It’s pretty similar to having too many responsibilities. Try not to do that because when you define an interface doing more than one thing, what happen is that many classes are not going to implement all the methods, but then when you have to modify that interface and those classes don’t even implement them, you’re going to have to modify these classes anyway. Here’s an example of this done wrong:

interface Persist
{
  boolean save(Object reservation);
  boolean restore(long id);
  void log(Object reservation);
  void sendNotification(Object reservation);
}

log and sendNotification here don’t make any sense, since the class name is Persist. Keeping true to its name, it should just persist and restore stuff. To simplify this, we should create two interfaces. One for Persist, one for Logger:

interface Persist
{
  boolean save(Object reservation);
  boolean restore(long id);
}

interface Logger
{
  void log(Object reservation);
  void sendNotification(Object reservation);
}

Dependency inversion (24:15)

A good example of this is the Hollywood Principle: “Don’t worry: We’ll call you when we need you.” The same idea applies to code: just pass all the dependencies, and we will call them when we need them. Of course, high level modules should not depend on low level modules. You should depend on abstractions, not on details.

SOLID TL;DR: Separation of concerns, Composition over inheritance, and Don’t repeat yourself.

How to Work With Legacy Code (26:19)

To work with legacy code, keep the focus on one task. By Martin Fowler’s hat metaphor explains this well, in that you can only wear one hat a time. For example, you could wear a builder hat when creating new code, a fancy hat when cleaning and simplifying it, a performance hat, and many more, but only one at a time.

In order to do that refactor in improving that legacy code what we have to do is first try to add tests to that method. If that’s not possible, go a level up in testing - intervention tests or end-to-end tests. Do something, so you know that by modifying that old code you are not breaking the app.

But if you don’t want to do that you can refactor the code and take some risks, but without actually changing any code. You are just moving code around so you are able to inject the dependencies and mock them. Once the code is refactored and is testable, we add the test, then we add the test for the new behavior that we want to add. We implement the new behavior and we are sure that these new behaviors pass the tests that we were reading before.

Once that’s ready, focus on improving the design. That’s what I follow, usually when I get a new feature request, I implement it really fast and see if it works. Once it works, I try to make it better, such as making a nice API that anybody in my team can work with.

Finally, verify that all tests passed and then ship it. 🚢

For a refactoring example of bad code, follow along with the example code in the video above and each commit of this repo for the movement of code during the refactor.


Israel Ferrer Camacho

Israel Ferrer Camacho

Israel has been developing Android apps since Cupcake. He is really interested in building reusable, testable and maintainable code, without forgetting to delight users with Material design experiences. Israel is currently working as Android developer at Twitter.