This 360|AnDev talk from 2016 will discuss advanced git techniques, how you can utilize different branching strategies to achieve a rapid workflow depending on the structure of your team (and demonstrating the branching strategy the Android team uses at Shazam), followed by an assortment of “hacks” that allow you to optimize the way you work and share code with others.
My name is Savvas Dalkitsis. I was a Senior Software Engineer at Shazam when I submitted this talk. I have since moved on; I’m now Lead Android Developer at ASOS.com, and you can find me on Twitter as @geeky_android.
Introduction (0:30)
What is this talk about? Obviously, it’s about version control, or source control, as you want to call it. We’ll be focusing, of course, on git. There’s a lot of other source control systems out there: SVN, Mercurial, Perforce. Why do we need source control? I hope that everyone here is using some sort of source control, hopefully, everyone is using git, anyone not using any of these aforementioned systems? Ok, why do we need those?
Why Source Control (1:00)
That’s a picture of a hard drive I salvaged out of my 486 computer, which I had when I was a kid. And it was really nice, because I found some old code that I wrote when I was ten years old–it was terrible, of course, as you would expect. But the problem wasn’t the code being terrible.
I had a big, gigantic folder with all my source. Some of it was in an uncompilable state. I wanted to try to compile it again, to see what my thought process was, but that was not possible. I typed some things, and I left it, never actually trying to compile it again. And I also lacked a historical view of how I developed the software that I had developed when I was a kid. So I lacked that ability to go back and see my thought process as I was writing this software.
Now another benefit of using source control is when you work in a team–of course, when you work by yourself, you sort of have a mental image of your entire project in your head, which, of course, can also fade with time. But when you’re working with a team, it’s imperative to put some order in the chaos of everyone trying to develop on the same piece of software. It also helps with accountability, not necessarily in a bad way, but when you encounter a bug and you need to fix it, you can easily go back and see who worked on that piece of code, so you can go and ask for help or any clarifications. Without source control, you wouldn’t be able to do that.
As things expand, you’ll also have this other problem: this is a view of Shazam’s global offices–we had seven, I think, two or three of them actually had engineering teams in them, and we did some sort of code sharing between them. Mostly, the teams are isolated, but one of our team members actually moved in to the States at some point, allowing us to have a distributed source control system which actually was very helpful. It solves many problems–like I mentioned before, accountability–but it also allows for different types of workflows.
You may have heard that at many big organizations like GitHub, Stack OverFlow, even for the Linux kernel itself, code is developed by very distributed teams across the world that don’t often update, let’s say, their respective repositories with what other teams are doing. They may have a set cycle where they try to merge everything together every few months. Using git allows you to have a sort of semi-detached repository that you can then distribute with the rest of your company.
How does source control help us solve many of these problems? Before we get into how we used it at Shazam, I’d like to begin by giving a brief overview of git, even to those that have used it and know how to use it, because to really understand how something can help you solve your problems, you have to really know how it works. How many of you would say that you’re experts at git? How many of you know the internals of how git works, like what the commits are and how they relate to each other?
Differences: git vs SVN (4:16)
To begin, I’ll compare git to a very known other control entity. SVN was the previously very popular source control system, that works very differently to git, and I will use that as a sort of comparison to highlight some of the good features that git has.
One of the main differences is that git is distributed and SVN is not, what does that mean? SVN is centralized. In SVN you have a central repository that lives somewhere in the server, and when you need to work using code in that repository, you have to check out a version of that repository locally. So, two users will check out a version of that repository, they can check out the whole repository, but the thing they have on their workstations is not an actual repository, it’s sort of a mirror of the original. The only way to do work is to actually be in constant communication with the repository. And we’ll talk about that later on.
But with git, there is no central repository. You can use it in a way that you can have a central repository that is your ultimate source of truth, but that’s not necessary. Every user checking out that repository is actually cloning an entire working repository that can be used as a source for other people to use as a centralized source control.
You can, of course, use this in a sort of similar way to SVN, where you have your central repository, and everyone keeps committing to the same place. But you can go crazy and have repository two talk to repository four directly, or you can go completely crazy and have all of the repositories sort of talking to each other, and occasionally updating one source of truth (which I would not recommend, because it is not very helpful).
Another main difference is that git can work offline while SVN cannot. That means when pushing code in SVN–committing, as it’s called–you have to be connected to the network. You have your local changes, you commit them to the central repository, and they are there for everyone to get them. When you’re offline, you can’t do this, so you can’t commit; any changes you have made will have to remain on your machine until you fix the network problem or whatever the actual issue is.
That can lead to many problems, because it doesn’t allow you a very flexible way of working when you don’t have network access. And this is not necessarily when you’re traveling on a plane to go to speak to a conference. It’s also problematic when you’re inside an office and there’s an IT problem, like your router–the main router that everyone is using–goes down, and suddenly you team can’t continue working because they can’t push commits.
In git, you can queue commits locally, and by promoting those commits, this is now called pushing. The terminology is similar to other source control systems, but not exactly; so initially, when git was gaining in popularity, a lot of people got confused by the terms committing and pushing. Committing is a local operation, and you can push when you have network. This allows you to just go on about your job without actually having to talk to the central repository.
Another main difference, which is a bit more hidden, is that the logical unit in git is a commit. And the logical unit in SVN is a revision.
What that means is that in SVN, the entire snapshot of your repository at any point in time is the logical unit. SVN works with that snapshot in mind. It doesn’t actually duplicate the entire tree every time there’s a new revision–that would be wasteful–but the way it operates is by treating each revision as one snapshot of your entire system. So every time you do another commit, logically, SVN treats this whole snapshot as one revision. This leads to problems that I will explain in a bit.
In git, the logical unit is a commit, it’s a very isolated set of changes. And it’s not just the code that you commit, there’s some other data associated with it. Anytime something changes, that is the difference between the previous commit. There’s no snapshot; the idea of a snapshot of your entire source doesn’t exist in git.
Now this is a very sort of technical and esoteric difference: in SVN, the actual files that are stored on an SVN repository are your repository. When you check out a revision, the entire source is checked out on your machine into the file system. So you will see there are folders for your branches, there’s a separate folder for your tags, and there’s a .svn
folder in each of those, containing metadata about each commit.
In git it’s slightly different. The entire repository is downloaded to your local machine inside a .git
folder, and the stuff outside of that is your current workspace, which doesn’t necessarily have to be the entire repository. You can have a specific branch, and that just leaves that specific branch’s code checked out into your file system.
Another main distinction is what branches are. In SVN, branches are folders, and in git, they’re just commits.
If you want to create a new feature branch in SVN, you’re literally creating the folder: if you have two feature branches, they are separate folders that live on your file system, side by side. Every time you have a different branch, you create a new folder for that, next to all the other ones.
In git, a branch is a labeled commit. A commit is made up of a few key parts:
- The code diff (basically what has changed from the previous commit–there’s no actual previous commit, but I will get to that in a bit).
- A set of metadata: the author’s name, when the commit was made, some other stuff.
- A hash code pointing to the previous commit. This is very important, because that’s how git identifies the history of a branch.
All of that information together makes up a commit. If any of these bits of information changes, you are talking about a different commit. So you can have exactly the same code diff, but if the author changes, or the time changes, or the previous hash changes, you’re now talking about a different commit.
All of that is calculated and combined to a cryptographically secure hash that now identifies this commit. So any time you change any of that information, this commit hash changes as well. And all of that is basically represented as a circle, usually, and that gets put into a tree that maintains the history of git.
So in this case, that previous hash that we talked about is basically the hash code of this entire commit. And that’s how git builds the history that it has: it has no notion of folders as branches. So a branch in git is literally giving a name to one of these hashes–there’s nothing special about it–so hash a4bgf3
is now called feature_1
. This is just a label to a hash code, there’s nothing different, there’s nothing special about it. master
is the same, and because of that, of course, master
doesn’t hold any special value in git, whereas in SVN, trunk is the root of your repository.
Another difference: SVN preserves history, but git allows you to change it, if you don’t set up any rules.
In SVN, as we talked about, history is preserved as a sequential increasing number of revisions. So regardless of where you push code, a global number changes and increases by one. So if you push to a branch, and I push to master, not trunk, your commit will increase the number by one, and my commit will also increase the number by one. So if you want to remove an older commit, the only option you have is to create a new commit that reverts the previous one, but the history is still maintained. In this example, you can go back and see that number three did something, and then number five changed it.
You can do the same in git: you can create a new commit that is the reverse of another commit, but it also allows you to do something clever, where you can basically say, I don’t want that commit, I don’t like it anymore, it contains some sensitive information–like, I don’t know, I put my keystore in my git repository and I want to get rid of that so future users cannot get access to that. So you can literally remove that from history, and now the commit that was on top of it will be linked to the previous commit, and because of that, the number that we talked about pointing to the previous commit has changed Therefore the commit itself is now a different commit, and it gets a new hash number.
So even though the diffs are the same, the commit is different, and at that point is where you will be if you have any conflicts trying to apply the commit into the previous tree, this is where you would have to fix those conflicts before you actually make the commit final.
Working with Remotes (13:38)
The final piece to understanding how to use git for all of these issues is to understand how remotes work. We talked about the remote repository and the local repository, we said there’s no actual difference between them, and that is true.
Let’s say you actually check out a remote repository. There are some labels on that repository: you get your master
, and then you get your local master, and you get this weird one called origin/master
. “Origin” is just a name that you use to identify the remote repository. Now your master, and the remote repository master, are not exactly the same thing; they’re just labels that point to a hash. So your master
and your origin/master
can actually diverge from the actual repository master. I will show you in a bit how that affects trying to resolve conflicts.
Let’s look at a scenario where you have the same repository as remotely, and you do a local commit that comes on top of your last one. Now your master
has moved on, it points to the new label, and origin/master
points to the old one. At this point, it’s very easy: you can actually push your changes to the remote repository, and everything will be fine.
But if, in the meantime, someone pushes code to the remote repository, creating a new commit, now your origin/master
doesn’t point to the actual remote master
, so you have to fix this somehow. One common way to fix this is to do a merge, so you effectively create a commit that merges the new commit that was put on remote, with your own local ones that haven’t actually made it up yet, so you get a tree that looks like that.
As you can see, there is this server, so this commit, this branch here, reflects the remote repository structure. That’s your local commit, and then you have a new commit trying to merge the changes between these two. To me, that looks a bit ugly, because now I’ve created a commit that doesn’t actually serve any purpose, other than trying to fix a problem that it shouldn’t have had in the first place.
But that’s a very common pattern; a lot of people use that, because git allows you to rewrite history and do something different. It’s called a Rebase.
Rebasing allows you to put away some of your commits for a moment, change your current tree, and then reapply them on top of it. So we’ll do a Rebase: we’ll move this commit that hasn’t actually made it to the remote on the side, and we’ll get the actual remote commit. Now our origin/master
points to the actual remote master
, and we will reapply this commit on top of our new origin/master
, and of course, because the link has changed, as we talked about before, this is now a different commit.
This is now safe to push, and there’s no third commit. Everything is safe and nice and smooth; there’s just one line of history. It’s very easy to figure out what’s happening.
I’ve left that old commit there because git doesn’t forget, so any commit that you made in git is always there, but unless it’s referenced by a label, it will not be visible in your tree. So if you open any popular tool for viewing git trees, you will see that the only commits you see are not actually the entire git tree: it could potentially have more commits that are invisible to you. There is a way to recover these later on. You can also instruct git to get rid of them if you don’t need them. But this commit will be there, and you can reference it by hash because it doesn’t have a label.
Shazam’s branching strategy (17:05)
The way we used git at Shazam can be boiled down to this sentence: Always push to master, branch when freezing for a release.
What does that mean? Before I explain it, I’ll just give you an overview of how our git tree used to look. It was a sequential, single line of commits, very easy to understand. And every time there was a release that we needed to do, we would freeze that branch at that point in time–we would call it an ongoing branch, we would also tag it as a release candidate. If we found any bugs in that release, we would do hot fixes on that branch and merge them back to master so that master was up to date. We kept doing that until we were satisfied with the release. At that point, we would get the final release candidate and actually push to the market release.
This allowed us to have two parallel modes of working: one part of your team could keep continuing working on features for the next release, while a few of you can just harden the actual release candidate. Now you can do this with many other different strategies, but the main benefit that I see with this is that it allows you to very clearly see the history of your commits. It’s a very simple, straightforward line, there’s no complexity here. You don’t have to do any mental math to figure out where branches come from. You don’t have to use any special commands to figure out in which branches each commit exists. It’s just one branch.
Now to do this, there are some prerequisites. You can’t just jump into this mode of working. You have to have a Behavior-Driven Development (BDD) and Test-Driven Development (TDD) mentality, because every commit on master could potentially break everything, and you don’t want that, because you want to always be ready to release. The whole idea is your product owner could come in at any point and say, “Yep, let’s release what we have now, and we need to do this today. So we need to cut a release candidate.” BDD and TDD allows you to have your project in a releasable state; it might not be completely ready to push, but with only a couple of commits, you should be able to create a release candidate ready for the Play Store.
The key is you need continuous integration that runs your entire BDD and TDD test suites every time someone commits. There’s a different talk that sometimes we used to do about how we achieve this at Shazam, because, of course, acceptance tests on Android are very slow, notoriously slow, so we have tools for speeding things up. And also we have pre-commit hooks that would prevent people from pushing code that breaks the build. So we have an integration plan in our Jenkins where when you want to push code on master, the moment you push, your code doesn’t actually get promoted to the git repository; but a local branch is created, gets pushed remotely, so that Jenkins builds it, runs the entire test suite; and only when that is green does that push gets promoted to master so that other people can use it.
The key part of this is the mentality of always Rebasing your code, so when you come to work in the morning, the first thing you do is Rebase any uncommitted changes you have on top of the current master. Rebasing also allows you to very easily fix conflicts. When you have long running feature branches, and then you try to merge–let’s say you have two feature branches, and you always keep them up to date with develop or master, but now you want to call your base branch. And you think, “Okay, I have no conflicts, because I keep fixing it every day,” but then when you integrate your branch into develop or master, another feature branch gets merged back, you get this weird conflict between the two branches, and now you have to go to the people that actually worked on that branch and try to resolve things that you might not be very familiar with.
If everyone is working on master, those changes are literally one or two commits at every time. So it’s very easy to fix, very localized, and you never run into surprises where suddenly, “yes, we have to release tomorrow, let’s get these two feature branches in the release,” or there’s a “merge integration hell, we can’t fix this.” This helps with that.
Because of everyone pushing in the same branch, it also encourages you to write your code in a way that is modular. You have to create your code in a way that, even if your feature is not ready, you can still push the app into the Play Store, because your feature is actually disabled. And the only way to have what are called “feature flags” in the back end world is to have an architecture in your app that allows you to easily swap out implementations of things, and also have a list of AB tests, or flags, that enable you to switch features on and off.
So to reiterate the benefits: it keeps us true, because having this process forces us to basically always think, “I need an acceptance test for this; I need to make sure that it’s covered by tests as much as possible. I can’t push code that is not production ready. And of course, I have to modularize my code.” The tree is very clean and easy: when you bring in new people into your team, it takes them less than a day to figure out how your code is structured. It’s very simple. I will talk in a bit, from personal experience, about how that affected me.
We have no code integration problems, because, like I said, the conflict resolution has to happen at a very isolated and local point. Releases are quick because of the previous points; you can always, at any point, calibrate a release branch, and harden it. Turning off features is very easy, because of that modularity, so even if after you actually release something, there’s a bug with it, you can easily do a point release that turns that feature off, without having to run a regression test again, because that feature was very easily modularized and swappable.
And it allows continuous deployment. This is not really a concern with Android, because you never want to actually have continuous deployment pushing updates to the Play Store like, three or four times a day. People would get really annoyed.
But this works for back end teams and web teams, where you can actually have a flow that every time you commit code, three minutes later, it’s actually live in production. This might not be for everyone, and it takes some time getting there, and it requires a very mature team–not technically mature, like with skills, but mature with each other, that know how to work together, that have worked together for years. So you can’t just immediately jump there. It takes some time. I think it’s a very useful thing. But for those that, you know, this is not their cup of tea, the most popular branching strategy in git is called GitFlow.
That’s the link to the original article that proposed it. Everyone was sort of doing a version of this anyway, but this is the first time that this was give an official name. And everyone does a version of GitFlow of their own anyway, even now that we have an official definition.
GitFlow basically means you have three main branches:
- Your master branch, where you don’t actually push any code directly. A master branch only contains the commits of every release that you’ve made after it’s actually gone live.
- What you do work on is the develop branch. You have a develop branch, which was initially branched off of master, where you either do commits that are very small and isolated…
- …or, from the develop branch, you actually branch out feature branches.
So the idea is, you branch from develop, you do some work on that feature, you keep merging from develop into your feature branch, so you keep it up to date, and then when you’re ready, you merge that branch back into develop, and it goes into testing. And if you have multiple ones running in parallel, this inflection point here is where you have to fix those conflicts. If you have an integration problem, that’s where you have to fix it.
Once the develop has gotten to a stage where the team feels it’s ready for release, you merge that branch back into a release branch that goes into your continuous integration plan. That builds the actual binary, it’s tested, and if there are any problems, you do fixes on that branch–they’re called hot fixes–or you create hotfix branches from that release branch. They get merged back into release, and release always gets merged back to develop. And once a release is final, you merge the release branch into master, and you tag it.
It looks a bit complicated, but it’s not as difficult as it sounds. The difficulty of this is the tree. Inspecting the tree when you come into a new team and you’re not really familiar with this flow is very difficult, because you get a lot of parallel running branches and it’s very difficult to trace their history: have they been merged recently, are they up to date, do I need to do another merge, where do I merge from? So it’s very complicated, but this allows you to do pull requests and code reviews, and very popular tools like GitHub will do this automatically.
Now that’s the branching strategy that my new company, ASOS, is using, and that’s where my personal experience comes in. Getting the entire project, you have to work with a completely new code base, and not only that, you also have to understand a branching strategy that might confuse you, and finding code might be tricky. So onboarding new people is a bit difficult with this pattern, but it works; it’s been used for many years now. And because many popular tools like GitHub or GitLab use this pattern, there’s a very easy, out-of-the-box way to do pull requests.
Tips and Useful Commands (27:15)
There are many commands that are so useful in git that most people don’t know, because they rely on graphical tools for using git. But git is a very powerful system, so to talk about a few of them, I have a hypothetical repository: this has like four commits, nothing special. No branches, no nothing.
The first and most useful command, is Rebase. I’ve already talked about Rebase, but there’s a special version of Rebase, it’s called an Interactive Rebase. You can invoke it by saying rebase -i
, and you can give it the name of a commit, or a relative name, so HEAD
is the current checked out commit, you can also say HEAD~3
, that means go back three commits from the current head.
That is basically a shorthand for this hash there, 57909a6
, but you can give that hash and it will work. This creates this file for you that you can edit. It lists out all the commits with their actual commit messages, and you can do a lot of cool things with this. The first thing you can do is you can delete a commit. This is how you change history. If I go and literally delete that second line from this file, and save it, git will actually try to reapply all of these commits in order that they appear in this file, so that second commit will disappear.
And because of the way that every commit gets applied to the previous one, if there’s a conflict between the second commit and the fourth commit, that’s the point where I would fix it. So if you’re trying to Rebase a big tree, and there’s conflicts everywhere, you fix them sequentially, one at a time. If you need to merge a branch that had 30 commits into a branch that there’s a lot of conflicts, you have to fix all the conflicts at once. Especially if it’s code that you haven’t worked on, there’s no easy way for you to isolate the changes and see them in relation to what they’re actually trying to do. Rebasing will actually prompt you on every single commit if there’s a conflict, and you can easily see that commit message and see what the code was trying to do.
You can also reorder commits, if you don’t like the way you committed them: for instance, there was a case where I actually forgot to run the test after I modified some piece of code, which is terrible of me, and after I’d run them, I realized, okay, and fix the unit test, which I did, but then in my git history, you had the code change, and then the unit as fixed, which I didn’t like. So at that point, I could simply reorder them–if you literally change the order of the numbers, git will try to Rebase them in that order, and again, if you get any conflicts, that’s the point where you fix them. That’s not what I wanted to do in this example. I wanted to have one commit that had the code change and the appropriate unit test fix, so you can squash commits as well. You can go in and say, this pick
, I will change to s
, which will squash it with the previous commit. So during Rebase, git will basically take those two commits, create a new one, and it will prompt you for giving it a new name as well.
That’s how you merge: you squash. And you can also edit a commit, you can literally say, “that second one, I didn’t like this (there was a typo or I didn’t like the message)” and you can edit, and when git applies them one by one, it will stop at that point, you can go and change your code, you can do whatever you want. You can write a new test and continue the Rebase, at which point it will create a new commit with the changes you made, and it will be like that’s the commit that you had.
Now this is very dangerous if you’re working with code that has already been pushed to a remote that other people have already checked out, because you’re rewriting history. So all the commit hashes are now different, and it’s like git doesn’t know what to do with that.
We actually used that when we inherited the Shazam project: it was a very old project from years ago, and one of the things that it had, which we didn’t like, was that it contained the keystore and the password of the keystore, which you should never ever do.
So we decided, enough is enough, we have many people now in the team, none of us should have access to the keystore and the password, it should be on the build books. So we went back into history, found the first commit that actually committed the keystore, removed that commit, changed the entire history of git–two or three years of commits–and then we made everyone check out a clean version of the branch, and now, it’s like it never happened.
Forcing labels. As we said before, a branch is not really anything special in git; it’s literally a name to a hash. So there’s a command called reset --hard
, then give it the hash code or a relative name like HEAD~3
. This will literally take the current label that you’re on and it will move it to that commit that you had.
In this example, it will move the master
branch into that commit: it will technically move the HEAD
, or the current branch that you are on. But you can use this to move another label that you’re not currently checked out on. So you can say, git reset --hard
branch one to this hash, and it will actually do that.
This can be very useful, mostly for local commits. So if you have, say, three commits, and you want to push two of them remotely because your third one is not finished, you can literally change your current master, one commit before, push that remotely, and then reset it to the current one, so you don’t mess up other people’s work.
Finding lost commits. This is getting back to the original point that git never forgets. git reflog
gives you a log of references. This will print out a list of all the caches that your HEAD was at; by default it prints out 5 or 10 of them. This is very useful for figuring out where your HEAD
has been in the past, whatever hour that you were working on. If you do a lot of switching between branches, and you may have forgotten to name something, or you deleted a branch by accident, and it’s still there, and you need to reference it, this is very useful. There’s a commit here related to that, that I actually put on my slides: it’s called gc
, where you can actually instruct git to garbage collect things that are not referenced, and it also does compacting, it does some compression. So running that will get rid of all these unreferenced commits.
Updating other branches. This is very useful if you’re in master, and you’re doing some work, and you fetch, and you realize that someone has pushed to another branch that you also have checked out, especially if you’re working on both branches, and you want to switch between them. The way to update that other branch is usually to check that branch out, do a git fetch again, or a git merge, or a git rebase, to get that label up to the right place so that you can then continue your work. But you can use this syntax–you can say fetch origin
, and a branch name, colon, the branch name–and it will actually get the remote version of that, take your label and move it up there, if there are no conflicts. So you will do a “fast forward.” This is how it looks: you will move the ongoing
to the origin/ongoing
.
Taking a commit from another branch. This is very useful, regardless of whether you’re doing master-only or GitFlow. Sometimes you may have either committed something by accident on a branch that shouldn’t have been there; or that should be there, but also in another branch that you’re not ready merge yet. cherry-pick
literally cherry-picks that hash code to my current HEAD
, and it will take that commit, clone it, bring it on top of where you are–the hash code, of course, changes now, because it’s a different commit, remember, it points to a different place. If there are any conflicts with that cherry-pick, that’s the point where it will ask you to fix them, and now your HEAD is there. This is very clever, because git remembers the fact that these commits are basically the same, so when you do merges, it will remember that, and if there’s any problem, it will remember that it’s the same commit, and it will not actually try to complain.
Deleting a remote branch. Sometimes when you have a branch that is no longer needed, you want to delete it remotely as well, you do a git branch -d
, which is the command for deleting local branches. But if you push to the origin
, or whatever you called the remote repository, :some_branch
(the name of a branch), this will actually remove that branch from the remote. Of course, this will remain in everyone’s repositories that have already checked it out. So the next command is getting rid of those commits that have been deleted remotely but you still have locally. prune
says, take the remote, check my repository, and anything that doesn’t match, just get rid of it. And this will delete that branch.
This is very useful, for example, if you’re working on a bug, and you figure out it was introduced three years ago, and that commit is part of the history now. You don’t want to change it, because then, of course, you’re changing everyone’s checked-out work environments. What you can do is go back to an old commit and append notes to it by using the notes
command. It’s very complicated, there’s a lot of versions of that, the documentation is online, and I urge you to go and check it out. This allows you to add data to old commits without changing that hash code. Because the notes that you add are not basically considered part of the actual commit itself, you can add as many of them as you want.
Searching is very useful, we do most of that through the graphical tools that we have, but these are very useful command lines to have. You can search the commit messages of all the commits in your history. If you’re like me, and you like all of your commits to have the JIRA tasks, or whatever project management tool you use, if all your commits have the ticket number there, you can easily go and find all the commits that were affected by this ticket number. Or you can also search inside messages, or inside the commits themselves, and there are really nice variations here, where you can say, when was the person repository first introduced, and there are syntaxes for the search. The link there will actually point to the slides where you can find that. You can find the first mention of a word or a sentence, and it will make it easy for you to go back and find, when was this piece of code introduced.
Lastly, when did we break it? This one of the most useful features that is under-used in git: bisect
. Again, if you’re like me, and you like having all your commits compilable and running, it’s very easy for you to figure out, when did we introduce this bug? You figure out there is a bug, you know that you didn’t have it three releases ago. So what do you do? You bisect it. You say, git bisect
, my current commit is bad, you flag it as bad, you find a commit hash or a release tag or something, and you say, that was good. And git will do a bisect: checking out the middle commit in that tree, allowing you to test if the bug still exists, and if it does, you say, git bisect bad
, and it will then do another jump on the right place of this bisect. And it will keep doing that, and in six or seven steps, you can trace through hundreds of commits and pinpoint the exact commit that actually introduce the bug. Very useful.
(There is a link to the slides, which are the old slides, but I will update them with newer versions.)
Q&A (38:34)
Q: when you merged two master-only, did you do code reviews? how did you do that?
There is a discussion around how you can do master-only with code reviews. We specifically didn’t, because we paired all the time, we view that as a sort of code review. There are also arguments why that doesn’t guarantee the quality that a code review does, because, you know, you have two people, you get the same tunnel vision that you would if you were by yourself. There is a way–and this is what I will be trying to get the ASOS team that I’m currently part of to go towards–where you can still do code reviews and maintain the main idea of master-only.
The idea of master-only is that you don’t have really long running branches that run back days, and have lots of pieces of work that run in parallel. You can do that by having feature branches that are small, but you keep Rebasing in master, so as long as you take your three or four commits, and always keep them up to date, you can still do the code review at the end. Say you have three commits, they get merged back to master, but the idea is that you’re never going to have this situation where you have a history, then you have a branch, and at the same time, before it was merged, there’s another branch at the same time. That is the thing that makes the tree really difficult to read. You can still have a branch that gets moved always on top, so every time you’ll see one line, and maybe a small sort of side step until you get back there. You can still do code reviews. And that’s sort of my idea of how I’m going to try to approach this. I haven’t actually done that, so this will be an experiment.
Also, it helps when you have a localized team, like, at Shazam, we were at the same office, until we actually got one of our team members to move to the States, but if you’re all in the same space, it’s very easy to do pairing. When you are remote, you might want to enforce code reviews.
Q: Are there any tools that are better than SourceTree for viewing the git tree?
I personally just use the Android Studio history of git, because again, if you have master-only, it doesn’t matter. You could use the command line, and it’s still nice. There’s another tool called GitKraken that is very popular now; it started like a month ago, I think it’s open source. It’s very flashy.
But the main problem with all the git trees is that they try to squish all the commits down to as few lines as they can. So when you have many parallel lines, they don’t actually expand so you can actually see them like that diagram I had. I don’t know if any tool does that, because I’ve been used to having only master-only, I haven’t really invested time in figuring out that, but I think GitKraken is the newest one that I’ve heard of.
Receive news and updates from Realm straight to your inbox