Showing posts with label git. Show all posts
Showing posts with label git. Show all posts

Monday, May 29, 2017

Microsoft's GVFS for git makes git a different thing

Microsoft is rather proud of their GVFS filesystem for git, but I think they don't understand quite what it is that they have done.

GVFS, in short, changes git into a different thing. The plain git is a distributed version control system. When combined with GVFS, git becomes... well, let's back up a bit.

A traditional, non-distributed version control system consists of a central repository which holds files, typically source code. Users "check out" files, make changes, and "check in" the revised files. While users have copies of the files on their computers, the central repository is the only place that holds all of the files and all of the revisions to the files. It is the one place with all information, and is a single point of failure.

A distributed version control system, in contrast, stores a complete set of files and revisions on each user's computer. Each user has a complete repository. A new user clones a repository from an existing team member and has a a complete set of files and revisions, ready to go. The repositories are related through parent-child links; the new user in our example has a repository that is a child of the cloned repository. Each repository is a clone, except for the very first instance, which could be considered the 'root' repository. The existence of these copies provides redundancy and guards against a failure of the central repository in traditional version control systems.

Now let's look at GVFS and how it changes git.

GVFS replaces the local copy of a repository with a set of virtual files. The files in a repository are stored in a central location and downloaded only when needed. When checked in, the files are uploaded to the central location, not the local repository (which doesn't exist). From the developer's perspective, the changes made by GVFS are transparent. Git behaves just as it did before. (Although with GVFS, large repositories perform better than with regular git.)

Microsoft's GVFS changes the storage of repositories. It does not eliminate the multiple copies of the repository; each user retains their own copy. It does move those copies to the central server. (Or servers. The blog entry does not specify.)

I suppose you could achieve the same effect (almost) with regular git by changing the location of the .git directory. Instead of a local drive, you could use a directory on an off-premise server. If everyone did this, if every stored their git repository on the same server (say, a corporate server), you would have something similar to git with GVFS. (It is not exactly the same, as GVFS does some other things to improve performance.)

Moving the git repositories off of individual, distributed computers and onto a single, central server changes the idea of a distributed version control system. The new configuration is something in between the traditional version control system and a distributed version control system.

Microsoft had good reason to make this change. The performance of standard git was not acceptable for a very large team. I don't fault them for it. And I think it can be a good change.

Yet it does make git a different creature. I think Microsoft and the rest of the industry should recognize that.

Friday, December 30, 2011

The wonder of Git

I say "git" in the title of this post, but this is really about distributed version control systems (DVCS).

Git is easy to install and set up. It's easy to learn, and easy to use. (One can make the same claim of other programs, such as Mercurial.)

It's not the simply installation or operation that I find interesting about git. What I find interesting is the organization of the repositories.

Git (and possibly Mercurial and other DVCS packages) allows for a hierarchical collection of repositories. With a hierarchical arrangement, a project starts with a single repository, and then as people join the project they clone the original repository to form their own. They are the committers for their repositories, and the project owner remains the committer for the top-most repository. (This description is a gross over-simplification; there can be multiple committers and more interactions between project members. But bear with me.)

The traditional, "heavyweight" version control systems (PVCS, Visual SourceSafe, TFS) use a single repository. Projects that use these products tend to allow everyone on the project to check in changes -- there are no committers, no one specifically assigned to review changes and approve them. One can set policies to limit check-in privileges, although the mechanisms are clunky. One can set a policy to manually review all code changes, but the VCS provides no support for this policy -- it is enforced from the outside.

The hierarchical arrangement of multiple repositories aligns "commit" privileges with position in the organization. If you own a repository, you are responsible for changes; you are the committer. (Again, this is a simplification.)

Once you approve your changes, you can "send them up" to the next higher level of the repository hierarchy. Git supports this operation, bundling your changes and sending them automatically.

Git supports the synchronization of your repository with the rest of the organization, so you get changes made by others. You may have to resolve conflicts, but they would exist only in areas of the code in which you work.

The capabilities of distributed version control systems supports your organization. They align responsibility with position, requiring more responsibility with authority. (If you want to manage a large part of the code, you must be prepared to review changes for that code.) In contrast, the older version control systems provide nothing in the way of support, and sometimes require effort to manage the project as you would like.

This is a subtle difference, one that is not discussed. I suspect that there will be a quiet revolution, as projects move from the old tools to the new.