This document discusses 7 approaches for handling large binary files in Git repositories:
1. Git Annex stores file contents separately from the repository and uses symlinks to reference them. Allows sharing between teams but requires learning new commands.
2. Git Large File Storage (LFS) stores file blobs on a separate server using pointers. Supported by GitHub but requires a custom server.
3. Git-bigfiles aimed to backport large file handling directly into Git but the project is abandoned.
4. Git-fat and Git-media work similarly to LFS using filters and storing files on remote servers like S3. They are older and less actively developed.
5. Git-bigstore and Git
7. Git-annex works by storing the contents of files
being tracked by it to separate location. What's
stored in the repo, is a symlink to the to the key
under the separate location.
8. In order to share the large binary files between
a team, tracked files need to be stored to a
different backend.
9. Pros
• Supports multiple
remotes that you can
store the binaries.
• Can be used without
support from hosting
provider.
Cons
• Users need to learn
separate commands for
day-to-day work
11. In Git LFS, instead of writing large blobs to a
Git repository, only a pointer file is written. The
blobs are written to a separate server using the
Git LFS HTTP API. The API endpoint can be
configured based on the remote which allows
multiple Git LFS servers to be used.
12. Git LFS requires a specific server
implementation to communicate with, and uses
filters, meaning that you only need to specify
the tracked files with one command.
13. Pros
• Github behind it.
• Ready binaries
available to multiple
operating systems.
• Easy to use.
• Transparent usage.
Cons
• Requires a custom
server implementation
to work.
• API not stable yet.
• Performance penalty.
15. Git-bigfiles makes life bearable for people
using Git on projects with very large files,
merging back as many changes as possible into
upstream Git.
16. Git-bigfiles is a fork of Git, however, the project
seems to have been untouched for some time.
17. Pros
• If the changes were to
be backported, they
would be supported by
native Git operations.
Cons
• The project is dead.
• Fork of Git might
cause compatibility
issue.
• Only allows
configuring threshold of
file size when tracking a
large file.
19. Git-fat works in a similar manner as git lfs.
Large files can be tracked using filters in
`.gitattributes` file. Large files are stored to any
remote that can be connected through rsync.
22. Git media is probably the oldest of the
solutions available. It also uses a filter
approach, and supports Amazon's S3, local
filesystem path, SCP, atmos and WebDAV as
the backend for storing large files.
23. Pros
• Supports multiple
backends
• Transparent usage
Cons
• No longer developed.
• Ambiguous commands
(e.g. git update-index --
really refresh).
• Not fully Windows
compatible.
25. Git-bigstore was initially implemented as an
alternative to git-media. It also works by
storing a filter property to `.gitattributes` for
certain file types.
26. Git-bigstore supports Amazon S3, Google
Cloud Storage, or Rackspace Cloud account as
backends for storing binary files. git-bigstore
claims to improve the stability when
collaborating between multiple people.
27. Pros
• Requires only Python
2.7+
• Transparent usage.
Cons
• Only cloud based
storages supported at
the moment.
28. Git-sym is the newest player in the field,
offering an alternative to how large files are
stored and linked in git-lfs, git-annex, git-fat
and git-media. Instead of calculating the
checksums of the tracked large files, git-sym
relies on URIs.
29. The benefits of git-sym are performance as
well as ability to symlink whole directories,
though because of its nature, the main
downfall is that it does not guarantee data
integrity.
30. Because of its nature, the main downfall is that
it does not guarantee data integrity. Git-sym is
used using separate commands. Git-sym also
requires Ruby which makes it more tedious to
install on Windows.
31. Pros
• Performance
compared to solutions
based on filters.
• Support for multiple
backends.
Cons
• Does not guarantee
data integrity.
• Complex commands.
32. !
How have you solved the
problem of storing large
files in git repositories?