Managing Consistency of Immutable Models

Immutable models have many advantages, but managing consistency can be difficult. At LinkedIn, we’ve open sourced two Swift libraries to help manage consistency and persistence of immutable models. Unlike Core Data, the API is non-blocking, exception-free, and scales well with large numbers of models and view controllers.


Introduction (0:00)

My name is Peter, and I’m an engineer at LinkedIn. Today I’d like to talk to you about how we’ve been using immutable models and being able to keep them consistent. Earlier this week, we open sourced a library to help us manage the persistence and consistency of immutable models. First I’m going to talk about why we wrote this library. Then I’m going to show you an example of how it might be used and dive into how it actually works and all the internals. I think that most people shy away from using immutable models because they believe that it’s really difficult to keep them consistent across their application. Hopefully today, I can persuade you that this isn’t exactly true.

Core Data (1:08)

Let’s go back about a year and a half ago to when we were rewriting our LinkedIn app in Swift. We wanted to build a caching and modeling solution to save data which we got from the network. We would first request data from our cache and display that to the user. Then we’d fetch data from the network. We’d then want to save that into the cache so the next time the user launched the application, they’d have a really quick and snappy user experience.

In searching for a caching solution, the first thing we looked at was Core Data. We’d shipped Core Data on a bunch of apps before, and we’d had some success, but we’d noticed a bunch of problems with it.

The first problem we ran into was stability. Core Data is notorious for crashing if one small thing goes wrong. It’s really easy as a developer to introduce race conditions very subtly which can be tricky to debug. In general, we found that approximately 50% of the crashes that we had on our Core Data applications were in some way related to Core Data itself, and these crashes were one-offs here and there. It wasn’t one big bucket that we could fix them all, making it really difficult to diagnose these issues.

The other issue we ran into was performance. In most Core Data setups, you read on the main thread; you can write on a background thread, which helps out a lot, but these reads can be unpredictable. Sometimes you might get a fast read because it’s ordering memory, but sometimes you have to hit the disk, you have I/O, and it can be slow. Again, it’s just hard as a developer to manage this.

Another issue is there are no eviction strategies. You couldn’t set a limit, a max of 30 or 40 megabytes, which means that it’s extremely hard to manage that as a cache as your application runs for weeks and months. The user uses it, and it uses up too much disk space. Another point was that migrations are necessary, so if you change the schema, you need to write some kind of migration. We change our model schemas almost daily as we add new features, so this was a very big deal for us.

The final thing for us was scalability. Facebook has talked quite a lot, a couple of years ago, about that problem, scaling Core Data. They believe that Core Data is very difficult to scale to large applications. Given that our application has hundreds of view controllers and hundreds of models, we’re terrified of this. Even if we were to solve all of these problems, we didn’t like the programming model that Core Data provided. One of the things we didn’t like is that all the models were mutable, and they’re not thread-safe, so when you pass it to a different thread, you need to reread from the database. Again, running into those performance problems I just talked about. Since we were writing something new, we wanted to adopt a different pattern.

Immutable models (3:29)

What we really wanted to try was to use immutable models for all the data in our application. Immutable models have long been a cornerstone of functional programming languages, and with the introduction of Swift to iOS, I think that they’ve been pushing this idea of immutability. Swift has a bunch of “immutable by default” features, and personally, I’ve been learning a lot about immutability because of Swift. I’m going to list a few of the examples of why immutable models are beneficial:

  • The first is that they’re thread-safe. Since you can’t write to the models and they’re read-only, thread safety comes for free. They’re very performant. With Core Data, when you access some property, it might not really be there and it might lazily do a fetch, again hitting the disk. With immutable models, it’s impossible to do it this way, because doing so would actually change the model. You always need to construct the entire model before you give it to the UI. That causes a little bit of extra effort up front, but this effort is in a background thread, so it’s going to be non-blocking. As soon as it hits the main thread, everything is in memory, ready to go and fast.
  • Immutable models are easier to debug. I’ve been coding immutable models for about a year and a half, and I really can’t imagine going back at this point. When I look at a big function, and some models are being passed around to other classes and other functions, all of these models I know are invariant. They’re not going to change. I don’t need to keep printing out their properties to make sure that some function isn’t changing one of the values.
  • The final thing is that they promote writing functional code. Functional code has a bunch of advantages, but one of them is that it makes it much easier to write unit tests for your code. For more examples, Keith Smiley gave a talk on immutable models a couple of months ago here at SLUG. It was a really great talk, and he gives more and more information about why immutable models are great.

There’s one question which comes up every time you bring up immutable models. How do I keep things consistent? I think this is the example which people have in their heads when they’re asking this question:

class MyViewController: UIViewController {
    let personModel: PersonModel

    func comeOnline() {
        personModel.isOnline = true
        refreshView()
    }
}

They have a view controller, there’s some immutable model as a property, and they want to write some function where they change some property on the model as someone comes online and they refresh the view. That isn’t possible if the model is immutable. So they say, “Oh, man, this doesn’t work. I’m just going to go and use something like Core Data or some mutable model solution.” What we truly want is to have immutable models, but we want all the consistency that we got from Core Data. If I change a model somewhere in my application, I want that change to be reflected across all my other models. Knowing this we formed the following requirements for our caching and modeling system:

  • We wanted immutable, thread safe models for the reasons which I just went over.
  • We wanted consistency, so a change in one place would automatically be reflected elsewhere in the application.
  • We wanted non-blocking access on all reads and writes. There wouldn’t be any disk access on the main thread.
  • We wanted a simple eviction strategy. Looking at some of our peers, we saw that a lot of apps were using 200, 300, 400 megabytes of disk space, and ideally, for us, we wanted to limit it below 50 megabytes.
  • We want it to scale well. Again, as I said, we have hundreds of models and view controllers, and we wanted easy migrations. Actually, we wanted to write no migration code at all, ever.

To accomplish all these things we wrote RocketData.

RocketData (7:15)

RocketData is a caching and consistency solution for immutable models. It’s intended to replace Core Data or at least fulfill that role in an application. It’s written 100% in Swift, and we really like it. Let’s be a bit more concrete about these requirements and take a look at how Core Data would look in a view controller. After we go over that, I’m going to explain how it works. Here is a view controller similar to the one we looked at before:

class MyViewController: UIViewController {
    let dataProvider = DataProvider<PersonModel>()

    func viewDidLoad() {
        super.viewDidLoad()
        dataProvider.fetchDataFromCache(cacheKey: self.id) { (_, _) in
            self.refreshView()
        }
        MyNetworkManager.fetchPerson(id: self.id) { (model, error) in
            if let model = model {
                self.dataProvider.setData(model)
                self.refreshView()
            }
        }
    }
}

The first thing you’ll note is instead of accessing a PersonModel directly, we have this DataProvider class, which is generic in terms of PersonModel. RocketData provides this DataProvider class, and that’s used for all your data access. To access the data, you call dataProvider.data, and this returns you a typed PersonModel which you can use for writing any business logic or view layout logic. It’s really easy. In the viewDidLoad, we want to do two things in parallel:

  • First, we’re going to fetch the data from the cache. In the completion block, all we’re going to do is call self.refreshView(). In this refreshView function, we would access the data from the data provider and lay out our view.
  • The other thing we’re going to do is fetch this person from the network. We’re going to call our own application or network stack, whichever way we choose to fetch data from the network.

In the completion block we’re going to call self.dataProvider.setData and then again, refresh our view. This setData function is going to do three things:

  • The first thing it’s going to do is immediately and synchronously set the data on the data provider such that we can access it in our refresh view.
  • The second thing it is going to do is lazily propagate this change to the cache and save it. Again, this isn’t going to block the main thread. It’s going to do this on a background thread.
  • The third thing it’s going to do is propagate this change to all of the other data providers in the application, and if there are any data providers who care about this PersonModel, it’s going to replace that model and give them a new to use.

I’m going to explain in a second how that all works, but first, let’s see how the view controller would respond to this change:

class MyViewController: UIViewController {
    func dataProviderHasUpdatedData<T>(dataProvider: DataProvider<T>, context: Any?) {
        refreshView()
    }
}

You implement one delegate method. DataProvider has updated data, and in it, all you need to do is call refreshView. That means that somewhere else in our system, something has changed which affected this person model, so we need to re-render our view. There are a few things to note here. First of all, all the disk access is asynchronous. The other thing is that the data is always in memory and ready to go. When we access that data property of the data provider, it’s there, all of one lookup, all in memory. The other thing is that there are no singletons for data. I think it’s a common pattern when you’re trying to share data across view controllers to create a singleton and put everything in there. Today I did a code review, and someone was fixing a bug where we forgot to clean up our state when we logged out. It leads to all kinds of bugs like this.

RocketData is a decentralized data store. The models are only owned by the data providers. This means that when the view controller is popped off the stack, the view controller gets de-initialized, the data provider gets de-initialized, and the models will always get cleaned up. There is no singleton which is holding a huge map of all the models in the system and doing anything like that. The data providers are the owner, and the view controllers are the only ones holding references to these models. This API looks fine, but how does this all work?

General architecture (10:55)

Slide 9 shows a great look at the general architecture of RocketData. On the left, you can see you have the view controller. That is application code. These have references to data providers, which is the main API input to RocketData. There are two types of data providers. There is a regular data provider and a collection data provider which manages an array of models. These talk to the Consistency Manager, which is the engine which manages all the consistency of all the data providers. I’m going to go over that next.

Both data provider and collection data provider talk to the data model manager. That is a lightweight class, and its only purpose is to forward models from data providers to the cache. The cache isn’t a part of RocketData. Instead, it delegates this back to the application and says, “Hey, please cache these models.” That is intentional, and it’s so that you can use any cache you’d like for your application, whatever makes sense for you, whatever has the right performance profiles and the right features for your application. There are a bunch of open source solutions which you can plug in here.

Let’s take a look at the Consistency Manager on slide 10. The Consistency Manager, as I said, is truly the engine which drives everything. It’s an entirely separate open-source library that RocketData depends on. It’s written completely in Swift, and you can use it independently. It has a pub-sub API.

In our case, the data providers are going to listen for updates on the Consistency Manager. The Consistency Manager notes down that this data provider cares about this data and stores it internally. Again, it doesn’t store any actual models. It stores listeners and IDs, because again, we want it to have that decentralized model storage. Later in the application, someone might call updateModel on the consistency manager. The Consistency Manager takes this new model and looks up to see if anyone cares about this change. If they do, it will ask it for its current model, and use a map function to replace this new model in the current model, and return a new immutable model to the data provider. The data provider is then going to tell the view controller to update its data. I went over that pretty quickly, so let’s have a look specifically at how this works.

Let’s take a look at this model on slide 11. That is a message model. One key thing is that all of our immutable models can be thought of as trees. In this case, the nodes are going to be classes and the edges are going to be references or properties.

So we have a message model, which might have some sender property, which is going to be a person model. We might also have an attachment property, which is going to be an attachment model. You’ll also notice that some of these have IDs. Message and person both have IDs, but attachment, for whatever reason we decided, just doesn’t make sense to have an ID, so it doesn’t. That’s totally fine. This model is going to be given to the Consistency Manager and say, “Hey, I want to listen to changes on this model.” Later in the application life cycle, we might get some kind of push notification or we might request the network, get some new data, and this data includes a person model where online is true. You’ll also note this has the same ID, ID 12. What the Consistency Manager does is it takes the original model and runs a map function on it, which allows it to iterate over all of the child models. If any of the child models have the same ID as the updated data, it’s going to swap it out, and it’s going to create a new message model. The message model was immutable, so we couldn’t set that field. We’re going to have a new message model, the updated person model. The attachment model is going to be exactly the same tree as before.

One advantage of using immutable models is you can toss them around and put them in anywhere, and it’s totally fine. One thing to note here is performance. You may think, well I just went over a very simple example here, but what if my models are really big and complex? Aren’t all these tree operations going to be a little expensive?

Well, yes and no. First of all, when I think of a really big data model, I think of maybe 500 nodes. I think that would be big and maybe at some point, you need to rethink your data. But if all the tree operations we run are actually O(M), so, in this case, it’s going to be O(500), and on the type of devices that we’re running these days, 500 operations is blazingly fast, and it runs very quickly. However, even given that, we decided that for the Consistency Manager, all the expensive operations are going to be run on a background thread so we would never block the UI thread and make sure that we’d never be responsible for an app slowdown.

With this method of looking at immutable models, we make our tree structures look more like a database. We only have a reference to the top level and can access message.sender and things like that, but if any of the sub models get updated anywhere in the system, we can trust that they’re going to be updated, and we’re going to get the most recent data in our view, and all our views are going to be in sync with the same data.

Models requirements (16:10)

So, what’s required from our models to do that? Well, RocketData defines a model protocol which you need to write for your models:

protocol Model {

    var modelIdentifier: String? { get }

    func isEqualToModel(model: Model) -> Bool

    func map(transform: Model -> Model?) -> Self?

    func forEach(visit: Model -> Void)
    
}

There are two required methods and two optional methods. The first required method is the modelIdentifier. That returns an optional string and allows the Consistency Manager to say, “Does this represent the same data?” The second thing is isEqualToModel. That tells the Consistency Manager, did something actually change? If nothing changed, it’s going to short circuit and say, “No need to update yourself.” If you implement the equatable protocol, you don’t need to implement this; it’s going to pick up the equatable protocol automatically.

The next two functions are optional. If you want to get the sub model consistency like I showed in the previous slide, you need to implement these. But if you just want top level consistency because you don’t care very much about the sub models, you can leave these out. The map function takes a transform closure. The responsibility of the model is to run this closure on all of its child models to get a new set of child models, and then return a new version of itself with all these new child models. The Consistency Manager uses this to generate the new models as I showed in the previous slide.

The other, and last, optional function is the forEach. That should simply iterate over all the child models you have and call this visit function. You can even implement forEach in terms of map, so you don’t need to write it on all of your models, but it’s a little bit more performant not to have to create a new version of self every time. It’s up to you how you implement this. One other thing about models is that the library is written in Swift, but it supports Swift classes and structs, and Objective-C classes. You can use whatever you want, whatever makes sense for you.

Collections (18:09)

Now on to collections. For us at LinkedIn, we’ve looked at all of our screens, and we noticed a lot of them are lists of models. We wanted to provide first class support for collections. As I said earlier, there’s another variant of the data provider, which is a collection data provider, which allows you to have an array of models. There are three features that we wanted for our collections: we wanted them to be ordered, easy to edit, and syncable.

The first one we wanted, ordered, might seem obvious, but when you look at Core Data, a lot of the collections are unordered and instead you have to define some predicate, like ordered by first name or something like that. That is great because it allows you to use some really interesting queries on your data, but for us, it was like the server dictated the ordering. For our feed, there’s no inherent ordering. We have some crazy backend algorithm which generates this ordering, it gives it to us, and we need to use that ordering. For us, we made the decision that all of our collections were going to be arrays, not sets, so they’re strictly ordered and whenever you retrieve them from the cache, you’d get them the same way, etc.

The other thing is easy to edit. They have insert atIndex, remove atIndex, update atIndex, all those things you’d expect for collections, making them very easy to interact with from a view controller.

The last one is that we wanted them to be syncable. If I have two view controllers that have the same data, I can give the collection data provider an identifier. If it’s the same identifier, if I remove a model from one, it will automatically get removed from the other. It will then also call a delegate method saying, “Hey, I’ve removed atIndex: 6,” so you can run all your table view animations and all that type of stuff as you normally would with some other framework. Again, this means that you can have decentralized storage, so you don’t have to have one Collection Manager which both view controllers talk to. You can have two separate data providers, but it’s all kept in sync.

Cache (20:10)

The last piece is the cache. As I said earlier, the cache isn’t implemented by RocketData. It instead delegates this back to the application. It does this via a delegate. Let’s have a look at what that delegate looks like:

public protocol CacheDelegate {
    func modelForKey<T: Model>(cacheKey, context, completion)
    func setModel<T: Model>(model, forKey cacheKey, context)
    func collectionForKey<T: Model>(cacheKey, context, completion)
    func setCollection<T: Model>(collection, forKey cacheKey, context)
    func deleteModel(model, forKey cacheKey, context)
}

There are five methods. The first one models the key, gives a cache key, and expects you to fetch a model from the cache if you can. Next one, setModelForKey gives you a model and a key, and says, “Please go set these in the cache.” The next two are the same thing for collections, collectionForKey, setCollectionForKey and the last one is deleteModel which says, “Please remove this from the cache.”

As you might notice, this looks a lot like a key value store API. RocketData is very well set up for use with a key-value store. That is good for a few reasons. One is that there are a bunch of open-source key-value stores. There are plenty of options for you to choose from, and you can pick whatever is best for your application. The other thing is that when we set this data in the key-value store, we’re going to serialize them and put them in there. When we load them out, it’s possible that this data might be very old. Maybe the user opened the application a month ago or so, and this is no longer valid data. In this case, we can simply try to pass it into a model, and if it fails, that’s fine. It’s a cache. It will say, “Oh, we got a cache mess. Don’t worry, we’re going to fetch from the network in a second.” In that way, we actually don’t need to write any migrations, and we never need to delete the app, delete the cache from the store or new version of the app or anything like that. That is great, but I’m not actually telling the whole story.

What we want is cache consistency. To do that we need to normalize our data. What do I mean by that? In slide 19, on the left, we have a de-normalized model or a tree structure. That is what our models looks like. If we de-serialize this whole thing and put it as one entry in our key-value store, then later, if we insert a person into the key-value store and then read back the message, we won’t get this updated person in our read. Instead, we’d get the original model. We wanted to take this one step further and try to get full cache consistency too. On the right is how I picture a normalized version of this model:

[
34: MessageModel,
12: PersonModel,
42: MessageModel,
51: PersonModel
]

It’s a dictionary where on the left you have IDs and on the right you have models. But these models aren’t full trees. They’re just one node. So it’s just the message model and just the person model. Effectively, it’s like flattening this whole tree, and that’s where our normalized model is. That’s how we want to cache stuff. When we cache this, we want to split it apart and flatten it and cache everything separately, and then when we read it again. We want to construct it back again from this key-value store. That isn’t completely trivial to do, I’ll be honest. We actually have an internal library that does this. Once you have written all of the map and forEach methods on your models it’s not that hard to do. The other thing is, you might just say, “Well, my application doesn’t really need that. I’m fine with not having cache consistency because I do have consistency while I run the application.”

Conclusion (23:42)

Let’s take a look at the big picture again about the application architecture on slide 20. If you remember, data providers provide a simple API for accessing and changing data. Getting and setting data is synchronous, but under the hood, there’ll be a bunch of asynchronous operations which will run and make sure all the other data providers are updated, and also update the cache. The Consistency Manager makes RocketData seem like a database even though it’s not, by ensuring that changes in one data provider are automatically updated in another data provider.

Hopefully, you’ll at least see that immutability doesn’t mean you need to sacrifice consistency. You can have both. Immutability has tons of advantages, and I love coding with immutable models. It makes me really happy. Both the Consistency Manager and RocketData, all written in Swift, are completely open-source. Please go check them out and let me know what you think.

Q & A (24:39)

Q: Is there any way to batch updates together?

Peter: Yes. There are two reasons to do this. One is because you might not want to run all these operations. The other is that you might want to make sure that two things update on the screen at exactly the same time, because since it’s asynchronous, it might be a millisecond apart or so. So, yes, you can do batch updates. You can give it an array of models; it will process them all at once, and update everything at most once.

Q: I had a question about the role of the data provider, and maybe I’m not understanding how it’s laid out. You have that diagram with the arrows going back and forth between the Consistency Manager and the data provider. I’m wondering, what’s the function that the data provider is serving? Why are there these two levels of indirection? If I have that relationship with the data provider, and the data provider has that relationship to the Consistency Manager, then isn’t that just window dressing in the middle? Why don’t I just have those two arrows going between the Consistency Manager and me, and then isn’t the Consistency Manager the singleton? What’s the thinking behind having these distributed data providers when they’re just a channel into the same consistent managing, singleton-y thing center?

Peter: There are two reasons. One is that the Consistency Manager API is not so user-friendly. If you look at the DataProvider class, it’s maybe 60 lines of code. All it does is implement these Consistency Manager APIs and give a better, friendlier API. You can say there is an actual direction there. The other thing it does is, the Consistency Manager doesn’t know anything about the cache, it’s just for consistency. The data provider does, every time you set data, forward that message to the cache automatically. The bigger one is the collection data provider, because, for various reasons, we decided that the Consistency Manager wouldn’t special case collections. It was just going to deal with just trees. So, the way that the collection data provider works is it assumes that your collection is a tree with one node per element in the array. But that mapping isn’t completely trivial, and it’s not obvious as a user that that’s how you should set up a collection. It implements some of the logic there. It also does some of the logic to work out which indexes changed and stuff like that. It wraps all that logic up. In general, all of RocketData isn’t super heavy, because the Consistency Manager does the bulk of the work, and really RocketData is just a nice API on top of that, plus caching.

Resources


Peter Livesey

Peter Livesey

Peter currently works at LinkedIn (https://www.linkedin.com/) as a Software Engineer working on iOS infrastructure. He spends his time thinking about networking, caching, consistency, immutability, threading, testing and scaling infrastructure to developers and users. He's helped ship the last two iterations of LinkedIn's flagship application.