Monday, May 29, 2017

Microsoft's GVFS for git makes git a different thing

Microsoft is rather proud of their GVFS filesystem for git, but I think they don't understand quite what it is that they have done.

GVFS, in short, changes git into a different thing. The plain git is a distributed version control system. When combined with GVFS, git becomes... well, let's back up a bit.

A traditional, non-distributed version control system consists of a central repository which holds files, typically source code. Users "check out" files, make changes, and "check in" the revised files. While users have copies of the files on their computers, the central repository is the only place that holds all of the files and all of the revisions to the files. It is the one place with all information, and is a single point of failure.

A distributed version control system, in contrast, stores a complete set of files and revisions on each user's computer. Each user has a complete repository. A new user clones a repository from an existing team member and has a a complete set of files and revisions, ready to go. The repositories are related through parent-child links; the new user in our example has a repository that is a child of the cloned repository. Each repository is a clone, except for the very first instance, which could be considered the 'root' repository. The existence of these copies provides redundancy and guards against a failure of the central repository in traditional version control systems.

Now let's look at GVFS and how it changes git.

GVFS replaces the local copy of a repository with a set of virtual files. The files in a repository are stored in a central location and downloaded only when needed. When checked in, the files are uploaded to the central location, not the local repository (which doesn't exist). From the developer's perspective, the changes made by GVFS are transparent. Git behaves just as it did before. (Although with GVFS, large repositories perform better than with regular git.)

Microsoft's GVFS changes the storage of repositories. It does not eliminate the multiple copies of the repository; each user retains their own copy. It does move those copies to the central server. (Or servers. The blog entry does not specify.)

I suppose you could achieve the same effect (almost) with regular git by changing the location of the .git directory. Instead of a local drive, you could use a directory on an off-premise server. If everyone did this, if every stored their git repository on the same server (say, a corporate server), you would have something similar to git with GVFS. (It is not exactly the same, as GVFS does some other things to improve performance.)

Moving the git repositories off of individual, distributed computers and onto a single, central server changes the idea of a distributed version control system. The new configuration is something in between the traditional version control system and a distributed version control system.

Microsoft had good reason to make this change. The performance of standard git was not acceptable for a very large team. I don't fault them for it. And I think it can be a good change.

Yet it does make git a different creature. I think Microsoft and the rest of the industry should recognize that.

No comments: