New Jul 19, 2024

Private Submodules

More Front-end Bloggers All from dbushell.com View Private Submodules on dbushell.com

For years now both my website source code and content have lived side by side in harmony within a public GitHub repository. This was convenient for the various build and deploy scripts I’ve monkeyed with.

Fast-forward to July 2024 where I was forced to replace GitHub LFS with alternate storage. 40 MB of my blog images are no longer on GitHub.

This got me thinking about my blog markdown files. Given the new threat AI poses to copyright, GitHub is just one more avenue for thieves. Evidently no license, or lack thereof, actually matters. I’m okay with my source code on GitHub. It’s there for other developers to see my crazy site generator experiments and possibly learn (bad lessons) from. My content is more valuable. If “AI” is going to steal it I’m not making it easy.

I’ve taken temporary measures before I move to a full self-hosted solution.

Git Submodules

The solution I came to was Git submodules. They’re are a bit of a headache from my experience but I got it working.

First I created a new private repo and committed my markdown files there. I then deleted the markdown and parent directory from my original repo. Next I added the submodule to match the original directory path. I ran the update command to pull down the markdown.

git submodule add git@github.com:dbushell/dbushell.com-data.git ./src/data
git submodule update --init --recursive

Exposing the private submodule name doesn’t matter. Now if you browse to src/data on GitHub you’ll see a reference like data @ 7a9c0da that links to a 404 page (unless you’re me).

For good measure I delete the hidden .git directory reinitialised with git init and reconfigured LFS and the submodule. I force-pushed this to GitHub effectively destroying all previous Git history. GitHub actually exposes activity logs that leak history if you know where to look. But whatever, it’s not easily reconstructed.

GitHub Action

For local development I can access both repositories with my SSH key. For deployment I need to give the GitHub Action access. In account Settings > Developer Settings > Fine-grained tokens I created a token with read-only permission to a private repo. I added that as an actions secret PAT_TOKEN and updated my deploy action.

steps:
  - name: Checkout
    uses: actions/checkout@v4
    with:
      submodules: recursive
      token: ${{ secrets.PAT_TOKEN }}

This worked almost first time. I had to correct a whitespace issue because YAML is a terrible format. submodules: recursive is obviously necessary. I think the token was necessary. The action logs do say “Setting up auth for fetching submodules”.

My action now starts with:

  1. Clones the public repo
  2. Clones the private submodule with authentication
  3. Fetches Git LFS from private storage with authentication

That’s a little more complicated than before but not unmaintainable. I reckon in a few months I won’t be scratching my head as to how this works. It helps to “document” the process on my blog.

Results

With these changes my website remains the same. My custom build scripts are still available to read on GitHub. My images are in private large file storage. My markdown source is in a private submodule. Everything is still visible at dbushell.com with a copyright notice and robots.txt that I’m sure “AI” will continue to ignore.

I’m aware the irony of using Microsoft’s GitHub. Nowhere is safe from AI. I’m working on a self-hosted solution.

Update for 25th July 2024

I coded a Git LFS server in TypeScript!

Scroll to top