I have been interested in "git for binary data" for a while, mostly for ML/compu...

kortex · on May 17, 2020

+1 for DVC. Setting up the backing store can be some extra work if you are doing that yourself, but after that it's a breeze.

What do you use for the backing store?

Git-lfs has been a pain in my seat since my first use of it. Most of the issues stem from the pointer files that have to be filtered/smudged pre/post commit.

Haven't used git-annex myself, but I have heard from coworkers that cross-OS is a pain.

dimatura · on May 18, 2020

Mostly S3. I used to do SSH, but these days I can afford to keep the data in the cloud. I do appreciate the possibility of migrating to other stores if needed in the future, though - might have to soon, for $reasons.

unqueued · on May 18, 2020

Actually, a lot has updated about git-annex in the last few years. It actually does support git pointer files like git lfs, which makes it easier to when you want to modify binary files. In fact, it can even use git-lfs servers as one of its back-ends. However, I still prefer symlinks mode, because operations on them are faster because it bypasses the smudge filter.

Also, git-annex uses reflink copies whenever possible, on zfs, btrfs, or apfs. Also, since people were talking about p2p and git, git-annex does this amazing trick for syncing directly to other git-annex repos, even with the checked out branch. There is no need at all for a seperate server.

I have used git-annex for years on OSX, and have not found it be deficient in any way compared to linux.

dimatura · on May 18, 2020

Yeah, git-annex has a lot of cool features that I have yet to see in other systems. I still use it for some things. My main pain point on MacOS was that the symlink mode didn't work well with some apps that didn't understand symlinks. Obviously this is not git-annex's fault, but it still made it so I couldn't use it. I think I could try again at some point and see if I could get it to use reflinks -- maybe it's a version issue.

I also had weird conflicts with the line ending (the whole CR/LF annoyance) on some of the metadata files git-annex used which I couldn't fix, no matter how many .gitconfigs I tweaked. Again, this is not really git-annex's fault, I think.

chubot · on May 17, 2020

Thanks for the response! I have heard of dvc and git annex, and it's probably time to give them another try :)