New Jul 15, 2024

Replacing GitHub LFS

More Front-end Bloggers All from dbushell.com View Replacing GitHub LFS on dbushell.com

GitHub sent me a scary email notification:

We wanted to let you know that you’ve used 80% of your data plan for Git LFS on your personal account […]

80% only one week into a monthly quota — oh dear! You might have seen my microblog note — this is the full article I promised. Basically GitHub’s free LFS is trash and you can do much better using any S3-like storage.

Reminder that Git and GitHub are not synonymous.

👉 Skip to my issue below if you’re familiar with Git LFS.

Git LFS

Git LFSLarge File Storage — is a sensible way to handle binary files in a repository. Binary files can’t be diffed like text. They bloat the repo and make operations slow. With Git LFS a text pointer is committed instead of an unwieldy binary blob.

version https://git-lfs.github.com/spec/v1
oid sha256:74d5092785ba3bf068d6e2d913839fb636ba39dae789ba743b14241691a14798
size 35157

This references the real file which is stored on an LFS server. Using LFS helps avoid the repo ballooning in size. When you do git clone it checks out the entire version history but only the most recent LFS binaries.

My Issue

My website is a public repo on GitHub using their LFS. GitHub’s free tier offers 1 GB of storage and 1 GB of bandwidth. That limit is account wide not per repository. What’s more:

Bandwidth and storage usage only count against the repository owner’s account. In forks, bandwidth and storage usage count against the root of the repository network. […] Forking and pulling a repository counts against the parent repository’s bandwidth usage.

About billing for Git Large File Storage

If I’m reading that correctly, using GitHub LFS in a public repo is dangerous. Anyone could maliciously rack up charges.

1 GB of bandwidth is rather paltry anyway. My website had ~40 MB of images in GitHub LFS. My deployment process to Cloudflare Pages — admittedly somewhat inefficient — cloned the entire repo every deploy. That gave me up to 25 deployments per month. Thankfully nobody forked or cloned my repo in the meantime.

This worked fine for years until last week when I started a microblog and quickly hit 80% of my quota. Oops, that’s untenable. This left me two tasks:

  1. Stop using GitHub LFS immediately
  2. Make my deployment process more efficient

I completed the first task just in time.

GitHub bandwidth usage showing 0.99 GB of 1 GB quota

The Solution

As noted I found a project called Git LFS S3 Proxy. This is seriously clever. It proxies any Amazon S3 compatible data store allowing it to be used as Git Large File Storage (LFS). This project runs on Cloudflare Pages but looking at the code it could be adapted to run anywhere. It’s a simple HTTP proxy using a small AWS library.

There are plenty of storage providers that support the S3 API. Cloudflare’s R2 free tier includes 10 GB and zero egress costs. I chose R2 for now.

Before you run commands: BACKUP! I made several dog’s dinners before I understood what I was doing. Disclaimer: the following is not a full tutorial.

After creating the proxy and R2 bucket with access tokens, I ran a command to ensure I had everything from GitHub LFS in my local repo:

git lfs fetch --all

Next I configured my local repo to point to my new LFS proxy server:

git config lfs.url 'https://...'
config lfs.locksverify false

This will update the hidden .git/config file:

[lfs]
  locksverify = false
  url = https://...

I supposed you could edit that manually. I’m not sure if the locksverify config is required but it fix an issue for me. There are additional steps if you’re not already using LFS. Refer to the README for detailed instructions.

I then pushed my existing large files to the new R2 storage:

git lfs push --all origin

Auth tokens are passed to the proxy via the local lfs.url config. It’s possible to commit a .lfsconfig file to the repo with a read-only token. This allows anyone to checkout LFS for a public repo. I cloned the repo on a second machine using this method and monitored network activity to confirm it was using the new R2 storage and not GitHub.

Success!

GitHub Support

At this stage the old LFS files are still on GitHub’s server and counting towards that 1 GB storage limit. They have the same hash IDs and are still visible browsing the repo on github.com.

GitHub’s advice is:

To remove Git LFS objects from a repository, delete and recreate the repository. […] If you need to purge a removed object and you are unable to delete the repository, please contact support for help.

Just delete the repo! I opted to create a support ticket. Not too difficult despite the Copilot “AI” support assistant acting like Clippyugh, ** off please. Support resolved this within a couple of hours. Still a waste of time though why don’t they provide a button to push? It’s not for my protection, I have a button to nuke the repo.

GitHub Action

I use a GitHub action to build and deploy my website to Cloudflare Pages. Previously it started with this step:

steps:
  - name: Checkout
    uses: actions/checkout@v4
    with:
      lfs: true

Not any more. The official checkout action does not respect .lfsconfig 👈 issue opened four years ago btw. I’ve left a comment with my workaround noted below.

I removed lfs: true and added a second step:

- name: Checkout LFS
  run: git lfs fetch --all && git lfs checkout

This fixed my deployment.

Not content, I went a step further and removed .lfsconfig and destroyed the read-only token. I created a new token and added that as an action secret. I updated my action to generate a temporary .lfsconfig on the fly:

run: |
  echo -e "[lfs]\n\turl = https://${LFS_ACCESS_KEY}:${LFS_SECRET_KEY}@${LFS_ENDPOINT}" > .lfsconfig
  git lfs fetch --all && git lfs checkout

Ha! Now nobody can checkout my large files but me and my action.

Task Done

What I love about this solution is that any S3 compatible storage can be used. Amazon S3, Cloudflare R2, Backblaze B2, and many more. They all have free tiers far more generous than GitHub LFS. The HTTP proxy could be implemented to run anywhere. Looking at the Git LFS API spec I’m almost tempted to write and self-host my own server. I only need it to be online during deployments.

Say goodbye to vendor lock-in.

Update for 25th July 2024

I’ve also moved my markdown files to a private submodule.

I then coded a Git LFS server in TypeScript!

Oh yeah — that second task I mentioned. Anyone know if it’s possible to do a partial deploy to Cloudflare Pages? i.e. only upload new files. Seems wasteful to be pushing 40 MB of images around every time I write a note. Shout me on Mastodon if you know.

Scroll to top