[Terragrunt GitOps - Part 6] Conclusions and discussion

Recap

In this series of articles, I provided my idea and the design of deploying Terragrunt code into a multi-customer, multi-environment setting. I instructed how to prepare your GCP environment for that. I discussed how the "runner" module (Cloud Build triggers module) is written and explained the rationale behind it. I showed how to "onboard" a customer into the solution and then how to operate to perform day-to-day tasks.

I haven't shown it in this series, but you can easily add Customer 2 with their environments - please treat it as an exercise if you'd like to replicate my solution!

The cost of this solution for me was... almost 0. You fall into the free tier for most services, with a slight exception of the Artifact Registry (if the image is > 0.5 GB).

Disclaimer

The core of this series is to give you an idea to use, get some inspiration, provide a template to expand, etc. This is NOT a read-made solution that you can just pull from GitHub and deploy in 3 minutes - it was never my intention to do so.

This solution was tested only very lightly, so bugs are expected. Again, if anyone wants to use portions of this solution, they have to look into the code, and adjust and fix any bugs they encounter.

I put an MIT license on the code, so please remember to note the author of the solution and copy the terms in any modified product.

Repositories

The links to the repositories are here:

terraform-random-sample-module (link)
terraform-storage-sample-module (link)
terragrunt-runner-module (link)
terragrunt-example-envs (link)

Limitations and points for improvement

While I included many features in the solutions, it's far from perfect. Some of the possible points that can be addressed are:

removing Terragrunt dependencies. As you can read in the official documentation and the GitHub thread,

Using run-all with plan is currently broken for certain use cases

and also

Right now, our general recommendation is to build very limited CI/CD pipelines for infrastructure that changes often. For example, you can have a CI/CD pipeline just for rolling out your application. Or you can have one just for making changes to the ASG.* This can help you workaround a lot of the issues mentioned above.

In this model, you are unlikely to use the plan-all variant, and instead will be using plan and apply on a single module. In this case, it will work very similar to plain terraform, although the main difference is that you will most likely want to use absolute paths to store the plan output so that it is findable (as you discovered).

As you have seen in this series, I use run-all commands. They work, but keep in mind the limitations mentioned above (and also the necessity to add those mock outputs).

add automation for the "bootstrap" phase. Depending on the use-case and requirements, many of the steps can (and should) be automated. Especially, in my opinion, pushing the Docker image to Artifact Registry. You can use many methods for that, the GCP example foundation comes to mind as one possibility.
make a pre-commit run on Pull Requests creation, not locally. Currently, the engineers have to remember to install pre-commit on their machines.
handle the dependency between onboard and deployments better. Currently, one has to remember to first onboard/add new environments, only later to invoke modules. If we change both at once, it may result in unexpected behavior.
use releases instead of tags. That way, you're not relying on an individual running git push --follow-tags locally.

[Terragrunt GitOps - Part 6] Conclusions and discussion

Table of contents

Recap

Disclaimer

Repositories

Limitations and points for improvement