Subversion vs. Git: Myths and Facts
There are a number of Subversion vs. Git comparisons around the web and most of them are based on myths rather than facts. The list below is intended to bust some of these myths. Although it doesn't tell which version control system is better, it should help you to understand the actual state of affairs.
The particular delta compression algorithms used in both version control systems differ in many details, but in general Subversion and Git store data in the same way. This results in the fact that Subversion and Git repositories with equivalent data will have approximately the same size. Except for the case of storing a lot of binary files, when Subversion repositories could be significantly smaller than Git ones (because Subversion’s xdelta delta compression algorithm works both for binary and text files).
Branches in Subversion are implemented with Copy-On-Write strategy (referred to as ‘Cheap Copies’ in the svnbook). No matter how large a repository or project is, it takes a constant amount of time and space to make a branch. In fact, Subversion branches are extremely cheap beginning with version 1.0 and you can branch even for small bugfixes in a very busy and large project.
3.It is required to manually specify the range of revisions when you merge two branches in Subversion
Starting with Subversion 1.5 (released in June 2008), Subversion implements the merge tracking feature and manual revision range specification is not required anymore. Moreover, Subversion 1.8 (released in June 2013) provides automatic reintegration merges that further simplify merging changes between branches.
Additional information: Branching and merging described in SVNBook
Starting with Subversion 1.7 (released in Oct 2011), working copies have centralized metadata storage and there is a single .svn directory in the root of working copy.
Despite all the marketing buzz related to Git, such notable open source projects as FreeBSD and LLVM continue to use Subversion as the main version control system. About 47% of other open source projects use Subversion too (while only 38% are on Git). The numbers are much better for companies, because Subversion is de facto standard enterprise version control system. Moreover, every month a number of companies migrate to Subversion from such version control systems as ClearCase and TFS.
6.Distributed version control systems are inherently superior to centralized ones such as Subversion
Distributed version control systems (DVCS) are just another approach to implement revision control. As it always happens, different approaches have their pros and cons. DVCS may be great for certain projects, but they have a number of limitations that become roadblocks for others: no access control, full copy of repository on every computer, no exclusive files locks and so on.
While Git is used for such renowned open source projects as Linux Kernel, it does not
scale well for truly large projects. The Linux Kernel repository takes about 2Gb of
disk space and it is acceptable to have the full copy of such repository on each developer’s
laptop. However, a problem arises when the repository size reaches hundreds of gigabytes.
This leads to a typical strategy to split large projects into a number of smaller Git
repositories and let developers to clone their subset only (see the list of GNOME repositories).
The obvious drawbacks of this strategy are the additional maintenance burden, loss of the atomic
whole-project commits, inability to make consistent branches, etc.
In contrast, Subversion does not limit the size of the repository. There is no practical limit for Subversion repository size and multiple projects can be stored in a monolithic repository without any restrictions. For example, all the projects of the Apache Software Foundation are stored in a single Subversion repository.
But the question is, do you really need to store multiple projects in a single monolithic repository (monorepo)? Git community insists that large monolithic repositories became redundant and have to be split into multiple smaller repositories. However, such companies as Facebook and Google maintain their codebase in huge, monolithic repositories. There is a number of reasons for that. For example, some of the advantages of the use of monolithic repositories are better code visibility, atomic large-scale refactorings, better dependency management and collaboration across teams. Read the article Why Google Stores Billions of Lines of Code in a Single Repository for further details.
While Git is successfully used for such crowded open source projects with thousands of
involved developers as Linux Kernel, it may not scale well for other large teams with
different workflows. In Git each developer must be up-to-date against the
entire upstream repository before promoting changes from a private repository.
If your team doesn’t follow Integration-Manager
or Dictator and Lieutenants
workflows, it will face a work slowdown because promoting to a ‘blessed’ public repository
will be effectively serialized.
Thanks to mixed-revision working copies, Subversion allows better concurrent work because only the individual files in question must be up-to-date before promotion.
In most cases merges become painful in Subversion only if you have file or folder renames in the merged branches. Due to historical reasons, Subversion doesn’t properly track file and folder renames (mostly because file renames rarely happened before refactorings were invented). Best practices to prevent tree conflicts during merge are simple: limit file and folder renames in branches, prefer to refactor code in the trunk. It is important to note that improved merging and better tree conflict handling are the hot features for the next Subversion release.
Git was initially designed as a low-level version control system, so it allows advanced users to do a lot
of hacky things but does not provide enough safety and abstraction for beginners and average users. Also
Git is widely criticized for the poorly designed and chaotic command line syntax. That leads to the longer
learning curve and could significantly increase the total cost of ownership for large teams with mixed levels
However, a complicated abstraction model is not mandatory for DVCS: the competing DVCS called Mercurial has a much more consistent abstraction model and provides a cleaner command line syntax. It's worth to note that Subversion is as easy and safe as any version control should be.
Additional information: Mercurial vs Git: Why Mercurial?
Git is officially described as a stupid content tracker and it doesn’t care too much about
keeping the precise history of changes in your repositories. Such features as implicit file rename tracking
and ‘git rebase’ command make it hard to find out the true history of changes in your codebase.
In contrast, with Subversion you always can get exactly the same data from your repository as it was in any moment in the past. Also you can easily trace all changes made to the particular file or folder, because Subversion history is permanent and always definite.
Because of the distributed nature of Git, each Git user has the full copy of the repository and effectively has
the complete read access to the entire content of the repository. While this approach is sufficient for open source
repositories that rarely contain any confidential information, it could be not acceptable for most of the enterprise projects.
At the same time Subversion provides a path-based authorization system that allows to granularly control who is authorized to read and modify files in the repositories and is sufficient even for large enterprise installations.
Modern version control systems are based on the assumption that most of the versioned files are mergeable.
In other words, it should be nearly almost possible to merge two concurrent changes made to a single file.
This model is called Copy-Modify-Merge
and it is used in both Subversion and Git.
The above assumption usually is not applicable for binary files and that’s why Subversion provides support for the alternative Lock-Modify-Unlock model (that is implemented by means of the svn lock command and the predefined svn:needs-lock property). Since Git is inherently distributed, it does not support exclusive files locks at all. This makes it hard to adopt Git for enterprise projects where a lot of non-mergeable binary assets usually exist.
The following example is designated to show that the size difference of Subversion and Git repositories is insignificant. The example is based on the comparison of the size of the official WordPress codebase repository which is powered by Subversion and its mirror hosted on GitHub.
The sizes of Subversion and Git repositories are pretty the same: 186MB in Subversion (35599 revisions) vs. 169 MB in Git (32647 revisions). Git repository is only 17 MB less than the corresponding Subversion repository, however it has less revisions as well (35599 in Subversion vs. 32647 in Git).
|Number of Revisions||35599||32647|
|Repository Size||195,153,948 bytes||177,922,471 bytes|
|Software Version||Subversion 1.9.2, 64 bit||Git 2.6.3, 64 bit|
|Comments||The repository was generated from the complete dump stream of the official WordPress repository. The repository has the Subversion 1.9 format with all the default settings.||The repository was simply cloned from GitHub. There are less number of revisions because some part of history is omitted in the mirror Git repository.|
As you can see, the difference of repository size is truly insignificant because the Git repository is only 10% smaller than the corresponding Subversion one. That’s not a surprise, because both version control systems use generally the same data structures and algorithms to store data in repositories.
The following example is designated to show that making a branch in Subversion is very cheap in time and space and that there is no significant difference when compared to Git. The example is based on the comparison of time and disk space required to make a branch of the official WordPress codebase repository's trunk which is powered by Subversion and making a corresponding branch with its mirror hosted on GitHub.
|Size before branch||195,153,948 bytes||177,922,471 bytes|
|Size after branch||195,155,256 bytes
grows on 1308 bytes
grows on 360 bytes
|Time to create branch||0.093 s||0.031 s|
|Size after commit to the branch||195,157,201 bytes
grows on 1945 bytes
grows on 1419 bytes
|Software Version||Subversion 1.9.2, 64 bit||Git 2.6.3, 64 bit|
Subversion repository is located on the same computer with
the working copy.
Copy-On-Write works incrementally.
You should have noticed that time taken by branch creation in Subversion is 62 milliseconds longer than in Git, however this is still less than average length of a human blink of an eye. It also has a difference in disk space where Subversion branch takes about 1 kilobyte more than Git one and in the age of terabyte disks this is negligible as well. Therefore, it can be considered that both of these differences are of no practical significance.
Git tracks file renames implicitly and it could be surprisingly easy to lose history for files that are committed with both rename and significant content change. The reproduction script is pretty easy:
Init a new Git repository:
$ git init
Create a new text file with some initial content:
$ echo Initial content > file1.txt
Add the created file to Git and commit it with the simple message.
$ git add file1.txt $ git commit -m "first change" [master (root-commit) b7fe376] first change 1 file changed, 1 insertion(+) create mode 100644 file1.txt
Rename the text file:
$ git mv file1.txt file2.txt
Replace the content of the renamed file:
$ echo Very new data > file2.txt
Commit the renamed file to Git:
$ git add file2.txt $ git commit -m "second change" [master 967f823] second change 2 files changed, 1 insertion(+), 1 deletion(-) delete mode 100644 file1.txt create mode 100644 file2.txt
Examine the history of the renamed file and find out that ‘git log’
does not show the “first change” commit for the renamed text file:
$ git log --follow file2.txt commit 967f8231ee59f4f0b97cff2ce72a152c74298820 Author: John Doe <email@example.com> Date: Tue Nov 17 10:13:07 2015 -0500 second change
As you can see, the history is partially lost after a file rename and there are no simple ways to find out the previous name of the file and changes made to its content.
Below are the links to some noteworthy posts where experienced Git users express a considered opinion on Git and Subversion:
- A year of using Git: the good, the bad, and the ugly, by Ivan Krivyakov, (2016)
- Unorthodocs: Abandon your DVCS and Return to Sanity, by Benjamin Pollack (2015)
- Proud to be a Moron – My Journey with Git, by Martin Kolb (2015)
- GIT: a Nightmare of Mixed Metaphors, by Jeffrey Ventrella (2013)
- 10 things I hate about Git, by Steve Bennett (2012)
Found this helpful? Share with your friends!
Please help others to get rid of the myths and understand the facts about Subversion and Git. Share this page in your favorite social network!