We all know how fast the package installation is and the speed at which specs get completed, when the project footprint is small. With time as the codebase grows it starts affecting the following aspects of our project:
- package installation time
- linting (11 minutes)
- code coverage (13 minutes)
- build (2 minutes)+ asset sync (10 minutes)
- and at last deployment times (2 minutes)
These areas linked together form a pipeline, to form a continuous delivery also called CD in short.
Now the question arises “what is CI (continuous integration)”?. It can be defined as the process that kicks off the continuous delivery. The foundation of CI/CD depends on the following four principles: frequent releases, automated processes, repeatable and fast processing. As the project evolves, the fourth principle, fast processing does not remain fast anymore and this is what we will tackle in this series of blogs by sharing the steps that we took in improving the pipeline process time.
This is how a typical pipeline looks like:
In most node-based projects, the time taken for the pipeline to complete is often neglected. The pipelines for one of our project was taking around 40 minutes right from running linter to deploying to production. This delayed shipping hotfixes/features to production.
This series is divided into two blogs:
- Caching in CI
- Faster asset syncing using AWS Lambda
Caching in CI
In a bigger codebase, the time taken to freshly install all the packages is pretty high. There are ways to reduce this time by using some caching techniques.
When it comes to our project the time taken to install was around 240 seconds, but reinstallation only took about 3 seconds due to the node_modules directory already being present and constructed to be consumed by our project.
What is Caching in Gitlab?
It is about speeding the time a job is executed by reusing the same content of a previous job. It is very useful in cases where the software depends on other packages which are fetched via the internet during the build time. The benefit that Gitlab provides is that the cache can be shared between pipelines and jobs.
For our CI/CD, we use Gitlab, which provides the following options to cache at different levels, which are at:
- Project level
- Job level
- Branch level
A typical, lint and test stage configuration in Gitlab would be as follows:
This used to take around 10 minutes for lint and around 13 minutes for the test stage.
If we notice lint and test config above we can see that we create a cache with the key yarn-cache-master which is responsible for caching .yarn-cache folder as this helps in improving the yarn install times.
Let’s break down the lint stage
We do the following during lint stage,
- Cloning repo (5–10s)
- Extracting .yarn-cache (180s)
- yarn install (90s)
- yarn lint (0.92s)
- uploading .yarn-cache (360s)
- Total time = 641s ~ 10.7 minutes
At the start of a job, the cache gets extracted and then at the end of the job the cache gets uploaded again. This runs for both test and lint job, increasing the job time.
It also becomes apparent that yarn lint takes less than a second to complete whereas cache upload and download take up to 80 percent of the time.
After digging a bit we realised, we could the following:
- Remove unwanted cache upload operations
- Reduce package installation time
- Update node_modules cache only when required
- Speeding up asset sync
Remove unwanted cache upload operations
Gitlab’s cache has an interesting property known as a policy. The policy has 3 values:
- pull-push (default)
This property tells CI whether to download (pull) or upload (push) cache.
In both stages, we cache .yarn-cache directory to speed up the installation time. The way the cache works is, if there is any existing cache it gets downloaded from a remote store and it gets uploaded again at the end of the job.
After a bit of debugging and going through the documentation, we realized that the pull-push policy was in effect by default. So we swapped it out with a pull policy.
Just doing that saved us about 6 minutes of the total time.
Reduce package installation time
We did an experiment of installing packages:
- using .yarn-cache
- and when the node_modules are already present
With .yarn-cache the installation time was still higher as it still had to resolve packages and construct the node_modules folder, but the second approach it just takes 2–3 seconds for the installation to complete.
So, we went ahead with caching node_modules to speedup package install.
Next, we did optimistic caching by creating a separate job to maintain the cache rather than doing that on every job that runs only on the master branch.
Let’s name our new job as cache_node_modules:
Notice here that we have specified pull-push as the policy which takes care of extracting and updating the node_modules cache.
Now when a lint job runs it just takes around 3 seconds for packages to install.
By doing this we have reduced the time from 10.7 minutes to 3.5 minutes which gives us a boost of around 300 percent.
Update node_modules cache only when required
Most of the times in a project, we develop features, fix bugs, or perform small tasks which do not require the addition of new packages.
This means that we do not have to run the cache-node-modules job every time. It has to run only when yarn.lock file changes. There is a special flag in Gitlab called changes which takes care of this condition.
After updating our GitLab configuration file.
Cache job runs only when yarn.lock changes in the master branch, this saves us around 8 minutes per pipeline.
After all the changes, we have saved around 15 minutes. This not only resulted in faster deployments but also improved developer productivity in shipping features to production.
Do let us know in the comments below on how we can improve it further or if you have tried other interesting approaches.
In the next blog, we will be talking about how we improved our deployment by using a lambda function.