New Aug 13, 2024

How to Migrate Production Code to a Monorepo

Multi Author Blogs All from The DigitalOcean Blog View How to Migrate Production Code to a Monorepo on digitalocean.com

In February 2024, the UI Platform team moved 1.3M lines of React micro-frontend code to a monorepo while retaining git history. Our team is responsible for the frontend architecture and UI Engineer experience at DigitalOcean, and moving to a monorepo is part of our frontend vision, of which much is lifted from Monica Lent’s Building Resilient Frontend Architecture talk. With a monorepo, we aimed to reduce our dependency management burdens and simplify our micro-frontend boilerplate to ultimately increase developer velocity.

While there are plenty of guides for getting started with monorepos, there are few that touch on migrating existing repositories over. This is the guide I wish I had when we started and I hope it helps someone else!

What is a monorepo?

A monorepo is a collection of isolated packages that live in a single repository. It reduces friction between shared code while keeping the safety gained from isolation. In contrast to a monolithic repository where the entire application is deployed as one, a monorepo allows packages to be deployed on their own.

Approach: moving to a monorepo

We’re fans of Kent Beck’s famous refactoring quote, “First make the change easy (warning this may be hard), then make the easy change”, and applied it to this work as best we could. In its essence, a monorepo is code colocation, so we restricted the actual migration to that alone; there would be no functional change in any of the apps but they would live next to each other. Any changes required to an app would get applied while it was in its own repo, so problems with colocation were isolated.

Our apps had been created over a period of roughly three years, and in many cases, the things that were learned from newer apps were not applied to older apps. It created a fair bit of inconsistency which added complexity to colocation and kicked off refactoring cycles. As we worked through each app, they needed to: run the local dev environment, tests, linters, and IDE plugins; run the CI/CD pipelines; and deploy to our staging environment. At least one of those steps broke with any two apps colocated, so we’d refactor the independent repos until the problem was resolved. Eventually, any two apps worked together, which actually meant all of the apps worked together.

For this article, I’ll break the project into three stages, though some pre-migration steps only became apparent as we worked through the task:

  1. Pre-migration: making the change easy

  2. Migration: colocating the apps

  3. Post-migration: optimizing the monorepo

Pre-migration: making the change easy

Scripting

We made automation our guiding principle–every change needed to be run from a script so that it was reproducible from scratch. We used zx so we could use both Node and CLI tooling in the same script. As we solved problems through refactoring, we’d update the script and template files (that mimicked the file structure of the future monorepo) and re-run it. We ran the script hundreds of times as it evolved and were able to eliminate human error on the day of the final migration because of the approach.

The script ran from an external repo so it wouldn’t be overwritten by force pushes, and performed the following steps:

  1. Initialized git in a temporary monorepo.

  2. Cloned each repo into a temporary folder.

  3. Removed things that would become irrelevant after migration and couldn’t be completed prior, like deleting yarn.lock and .nvmrc.

  4. Created a move commit that put all the files in the correct workspace folder.

  5. Merged the unrelated histories from local remotes.

  6. Copied the template files into the monorepo.

  7. and finally force-pushed the repository.

This is it, with annotations, in its entirety:

process.env.FORCE_COLOR = '1';

import { $, path, os, cd, spinner } from 'zx';

const SCRIPT_ROOT = path.resolve(__dirname);

const MONOREPO = path.join(os.tmpdir(), </span><span class="token string">monorepo-</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span><span class="token known-class-name class-name">Date</span><span class="token punctuation">.</span><span class="token method function property-access">now</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">);

const REPO_PREFIX = 'git@github.com:username/';

const REPO_SUFFIX = '.git';

// repo names to fetch from Github

const REPOS = ['repo-a', 'repo-b'];

// 1. Initialize git in monorepo

cd(MONOREPO);

await $</span><span class="token string">mkdir -p </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span><span class="token constant">MONOREPO</span><span class="token interpolation-punctuation punctuation">}</span></span><span class="token string">/apps/</span><span class="token template-punctuation string">;

await $</span><span class="token string">git init</span><span class="token template-punctuation string">;

await $</span><span class="token string">git commit --allow-empty -m &quot;Initial commit&quot;</span><span class="token template-punctuation string">;

cd(SCRIPT_ROOT);

// Merge git histories loop

for await (const repo of REPOS) {

const repoUrl = </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span><span class="token constant">REPO_PREFIX</span><span class="token interpolation-punctuation punctuation">}</span></span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>repo<span class="token interpolation-punctuation punctuation">}</span></span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span><span class="token constant">REPO_SUFFIX</span><span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">;

const tempRepo = path.join(os.tmpdir(), </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>repo<span class="token interpolation-punctuation punctuation">}</span></span><span class="token string">-</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span><span class="token known-class-name class-name">Date</span><span class="token punctuation">.</span><span class="token method function property-access">now</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">);

// 2. Clone the app into a temporary folder

await $</span><span class="token string">mkdir -p </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>tempRepo<span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">;

await $</span><span class="token string">git clone </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>repoUrl<span class="token interpolation-punctuation punctuation">}</span></span><span class="token string"> </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>tempRepo<span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">;

cd(tempRepo);

// 3. Remove these files and folders because they're no longer necessary and it speeds up this script

await $</span><span class="token string">rm -f .gitignore .gitattributes .github .nvmrc yarn.lock node_modules .yarn build</span><span class="token template-punctuation string">;

// try…catch so non-zero exit codes don't stop the script from continuing

try {

await $</span><span class="token string">git add .</span><span class="token template-punctuation string">;

await $</span><span class="token string">git diff --staged --quiet || git commit -m &quot;[</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>repo<span class="token interpolation-punctuation punctuation">}</span></span><span class="token string">]: Remove conflicting files&quot; --no-verify</span><span class="token template-punctuation string">;

} catch {}

// 4. Create a move commit

// In order to preserve git history accurately, we need to create a

// move commit from the root of the sub-repo into a directory that

// imitates the monorepo ie. from ./ to ./apps/

const mainBranch = (await $</span><span class="token string">git branch --show-current</span><span class="token template-punctuation string">).stdout.trim();

await $</span><span class="token string">mkdir -p apps/</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>repo<span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">;

await $</span><span class="token string">git ls-tree </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>mainBranch<span class="token interpolation-punctuation punctuation">}</span></span><span class="token string"> --name-only | xargs -I{} git mv {} apps/</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>repo<span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">;

await $</span><span class="token string">git commit -m &quot;[</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>repo<span class="token interpolation-punctuation punctuation">}</span></span><span class="token string">]: Move </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>repo<span class="token interpolation-punctuation punctuation">}</span></span><span class="token string"> to app/</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>repo<span class="token interpolation-punctuation punctuation">}</span></span><span class="token string">&quot;</span><span class="token template-punctuation string">;

cd(MONOREPO);

// 5. Merge git history using local remote so changes wouldn't break live codebases

await $</span><span class="token string">git remote add </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>repo<span class="token interpolation-punctuation punctuation">}</span></span><span class="token string"> </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>tempRepo<span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">;

await $</span><span class="token string">git fetch </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>repo<span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">;

await $</span><span class="token string">git merge --allow-unrelated-histories </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>repo<span class="token interpolation-punctuation punctuation">}</span></span><span class="token string">/main</span><span class="token template-punctuation string">;

await $</span><span class="token string">git remote rm </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span>repo<span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">;

}

// 6. Copy template files

cd(SCRIPT_ROOT);

await $</span><span class="token string">cp -a monorepo-template/. </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">&#x24;{</span><span class="token constant">MONOREPO</span><span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">;

cd(MONOREPO);

// Create fresh yarn.lock, yarn install exits with non-zero

try {

await $</span><span class="token string">yarn install --refresh-lockfile</span><span class="token template-punctuation string">;

} catch {}

await $</span><span class="token string">git add .</span><span class="token template-punctuation string">;

await $</span><span class="token string">git commit -m &quot;Init monorepo&quot;</span><span class="token template-punctuation string">;

// 7. Rebuild the monorepo every time

await $</span><span class="token string">git remote add origin git@github.com:username/your-new-monorepo.git</span><span class="token template-punctuation string">;

await spinner(() => $</span><span class="token string">git push -f origin main</span><span class="token template-punctuation string">);

console.log('🎉 monorepo is live');

Github Action workflows

Updating our CI/CD jobs in Github Actions to support running both single- and multi-app repositories was one of the first tasks. We passed a working-directory into shared actions so each job would run from the application’s folder instead of the root as if it were in a single-app repository. We used working-directory as the input parameter name and set the default to ’.’ for backwards compatibility.

Our deploy workflows had custom keys, like app_name and service_id, which were hard-coded strings in each repo’s deploy workflow. We extracted these values into another file and added a step to read them so our workflow actions could be generic.

In the templated files, we built an action that could detect what workspaces changed, then would return a matrix to fire off subsequent jobs for only changed workspaces. It reduced wasted Github Action time, but also prevented more critical things like unnecessary deployments or e2e jobs from running.

Yarn 4 upgrade

After a couple of days attempting to fix inter-app dependency conflicts in Yarn 1, we decided upgrading to Yarn 4 was a required milestone because of its improved workspaces support. With nmHoistingLimits set to workspaces, each app could contain conflicting dependencies, effectively running in isolation.

Yarn has a great migration guide and was painless for the most part. We broke the work into two pull requests per application: explicitly add undeclared dependencies as per Yarn’s rules; and complete the upgrade to Yarn 4. In practice, I upgraded each app locally, then ran yarn dlx @yarnpkg/doctor and npx depcheck to identify the missing packages. Once I had the list, I reinstalled them on a new branch to safely separate changed dependencies from the Yarn upgrade.

The way Yarn is installed has fundamentally changed between version 1 and 4, so I needed to support the team when they ran into issues upgrading on their machines. In all cases, the problems stemmed from location issues, typically with the wrong version of Yarn running. Node, Corepack, and Yarn all need to be installed within your Node version manager, like /Users/you/.nvm/versions/node/v20.9.0/bin/node. You can check the locations with:

which node
# should output something like /Users/you/.nvm/versions/node/v20.9.0/bin/node

if you're using nvm and you get something else run:

nvm use

which corepack

should output something like /Users/you/.nvm/versions/node/v20.9.0/bin/corepack

if you get something else run:

corepack enable

which yarn

should output something like /Users/you/.nvm/versions/node/v20.9.0/bin/yarn

if you get something else run:

corepack install

Migration: colocating the apps

Once all apps were running as expected, we announced a migration date and the full plan. Like Stripe’s migration from Flow to TypeScript, we wanted developers to leave Friday afternoon and start work Monday morning in the brand new codebase with no ceremony.

On the day of, we posted steps in Slack so there was a clear record in case anything went wrong and that anyone watching could follow along. The steps were largely double-checks, but obviously included the actual migration too.

  1. We ran through one last review of the build script and template files then compared it against the last working run.

  2. We ran the script for the last time, rewriting the repo history again with a force-push.

  3. We manually kicked off the PR CI/CD pipeline to confirm all the apps pass.

  4. We manually ran the staging deploy jobs to ensure all the apps deployed.

  5. We turned on branch protection, merge checks, permissions, and other repo settings, as well as enabled our automatic CI/CD jobs.

  6. And finally, we archived the old app repositories.

We left instructions for getting started and held office hours for any engineers to drop in and troubleshoot each day for the following week. We also migrated a handful of open PRs that weren’t merged by the migration date with a couple commands from the command line:

# From the archived repo, rebase your PR commits into a single commit, change the sha prefix to fixup

git rebase main -i

Run the move commit so all files live within an ./apps/ directory like the monorepo

This only moves changed files to reduce conflicts + commit noise

APP_NAME=REPLACE_THIS_WITH_YOUR_APP_NAME

for file in $(git diff main --name-only --cached); do target_path=$(dirname $file); mkdir -p "apps/$APP_NAME/$target_path"; git mv $file "apps/$APP_NAME/$target_path" -v; done;

Squash the commit to previous batch of PR commits

git commit --amend --no-edit

Copy the sha output

SHA=$(git rev-parse --short HEAD)

In the monorepo

cd monorepo

Checkout a new branch that matches the original PR name

git checkout -b …

Assuming the monorepo and original repo are in sibling folders, run

git --git-dir=../${APP_NAME}/.git format-patch -k -1 --stdout ${SHA} | git am -3 -k

Then open the new PR

Post-migration: optimizing the monorepo

The following few weeks after the migration were spent tidying and optimizing it.

We installed dependency-cruiser to restrict the ability to reach into sibling modules through the file system and instead require standard package importing. This keeps our monorepo code isolated and prevents a ball-of-mud from forming. The rule that enforces that looks like:

{

name: 'apps-not-to-apps',

comment: 'One app should not reach into another app (in a separate folder)',

severity: 'error',

from: { path: '(^apps/)([^/]+)/' },

to: { path: '^$1', pathNot: '$1$2' },

}

We moved packages and settings (like Prettier and Browserlist) that were duplicated in workspaces into the root directory, and then standardized them. We also abstracted developer dependencies (like eslint, stylelint, Cypress, and Jest) into isolated workspaces under ./packages, then imported them into each app with workspace:*. These new packages are self-contained so all of their plugins and settings could be accessed with a single import, and so it would be easy to keep track of their versions.

Our team made several Github Action improvements as scaling problems immediately surfaced when our pipelines ran across multiple applications.

Finally, we added a .git-blame-ignore-revs file with the shas of batch commits so they would get hidden from git history.

Conclusion

This move took us one quarter to complete and was the largest frontend code migration at DigitalOcean thus far. We’ve seen the average number of React-related feature PRs increase by 1.6x, and the average number of internal library bumps decrease by 95%. While it’s harder to get an accurate measurement, each batch of our library bumps used to take most of the day and can now be released and upgraded in under an hour. Soon we will completely eliminate those bumps with Module Federation. It’s also been significantly easier and safer to do sweeping changes, like fixing all our eslint errors and warnings, or upgrading third-party libraries.

There’s always room for improvement and the two challenges we ran into were from the Yarn 4 upgrade and our CI/CD deploy pipeline. We hadn’t communicated how critical Yarn 4 was to the project and that it was our new norm for frontends, so we inadvertently left some team members behind with Yarn 1. When the monorepo launched, they were unable to get the repo running and we spent most of the first few days troubleshooting environments. Additionally, while we ran staging deploys both before and on the day of the migration, we failed to consider running production deploys which were slightly different. Our automated production pipeline was broken first thing Monday morning, but we luckily had it up again before lunch. For the next project, we’ve created a more robust release template that includes communication and support around required developer changes as well as better steps for the entire production process.

Breaking work down as if it were a refactor worked extremely well. We were able to keep track of progress (even as new tasks were added) and point to discrete batches of work for both issues and successes. The approach felt measured and straightforward with very little room for surprises or risk. There are still things to optimize across our frontend architecture and the monorepo is helping us move through it much faster. If you’re starting with a brand new repo, I’d like to recommend these four articles that helped us along the way:

Scroll to top