Terraform project structure
###########################

:date: 2021-09-14
:summary: My preferred Terraform project structure.


Recently I've been using `Terragrunt <https://terragrunt.gruntwork.io/>`_ and I
have thoughts on what it offers and is it useful. My usage has been in an
existing project that follows the Gruntworks guidelines closely and with the
paid subscription to the Gruntworks library. These opinions are my own and
they're based on my recent experience with Terragrunt as well as managing small
and medium infrastructure with Terraform for the last few years both as a single
developer and part of small team.

The main point of Terragrunt as I understand it is keeping from repeating
yourself in code. I am not a fan of copying and pasting big blocks of code nor
of having to change the same value in a few different places. So for me keeping
code DRY is a worthwhile endeavor.

Keeping modules DRY
-------------------

Terragrunt works by using modules. I like Terraform modules. Even the Terraform
documentation suggests that you don't have a single top level module for your
entire infrastructure. It makes development more difficult with more merge
conflicts. It makes deploying for testing purposes more difficult because
Terraform will keep trying to delete resources that aren't in your code (because
someone else working in a different branch has made changes for some other
reason). You can work around that by specifying the target you're interested in
but that is error-prone and is tiresome after a while.

In a previous project I worked on we had a module for roughly each service. We
had quite a lot of code that was copied from one module to another (like
when creating a new RDS instance we also created the subnet group, the security
group for the client, etc.). Over time we saw clearly what code was shared
between the different modules, we created a :code:`library` directory and
started adding sub-modules there and after a while we had a nice library of
reusable sub-modules and things were good.

Because we waited a bit before creating a new sub-module they were pretty
stable. When we did have a change to the a sub-module that we wanted to deploy
across the entire infrastructure, we would open a branch, work on all the needed
changes there, test them in one of the testing environments and then open a PR
that has all of the changes (the sub-module changes, the calling modules
changes, any fallout from those changes).

This process fitted us nicely. The PR had the entire picture and we could really
see if the change improved anything (like adding an output to a module to be
used in a different module would be clear if you see it being used). We did on
occasion had conflicting changes and we did had to use targeted :code:`plan` and
:code:`apply` but as far as I can remember less than once a quarter.

Terragrunt recommends splitting the repository in 2, one for sub-modules and one
for actually deployed modules. Then you create :code:`terragrunt.hcl` files that
list the sub-modules needed with the Git ref used. This allows you to use the
RDS database sub-module from today but the auto-scaling group from last year. I
see little point in this.

The change process goes as follows, 1 PR for the sub-modules repository and 1
for the live repository (or more, we haven't gotten around to discussing
environments yet) Now I hear that the recommendation has changed. The new
recommendation is that each sub-module will be in a separate repository. So more
PRs for each change (that one change of adding an output and using it became
less obvious but requires more work, I wouldn't call it a win).  I wonder if
there's any place that has 2 repositories, 1 for code 1 for the tests and you
change the code, and when it's merged you go to the tests repo and update the
tests there to use the new code to see if it passes?

Another outcome from this way of working I keep seeing is that because changes
are not applied (or planned) before merging the changes to the sub-module,
errors and issues are only found out later which triggers more PRs.

Environments, remote states and workspaces, oh my
-------------------------------------------------

Another way that Terragrunt keeps your code DRY is by generating the Terraform
backend configuration, because you can't use variables there with Terrafrom. So
you save less than 10 lines. Cool. Also, you won't have by accident (because you
copied that code from another module) used the same location for 2 modules and
have them delete each others resources. It happened to me more than once, but
you see it clearly when you first run :code:`terraform plan` so it's very easy
to catch.

Now, the folks at Gruntworks suggest you create a directory for each
environment. From what I can see, that means you copy your
:code:`terragrunt.hcl` file to each directory and you modify it slightly (I
think you can see where I'm going with this). If your project has a different
module for each environment, this is a win. no doubt about it. I've seen
projects like that and it's really a pain to manage.

Before I ever heard about Terragrunt, I had this exact problem. I solved it
using Terraform workspaces and a simple convention. Each environment would have
its own workspace (let's say that the default workspace is the sandbox but
that's up to you). Each module would have a bunch of :code:`tfvars` files for
each environment. The workflow for deploying to the :code:`dev` environment
would look like this:

.. code:: shell

    terraform workspace select dev
    terraform plan -tfvars dev.tfvars -out tfplan
    # Review the changes.
    terraform apply tfplan

For making life a little easier I also added the following snippet to each
module:

.. code:: terraform

    locals {
      module = "${basename(path.module)}"
      env    = "${terraform.workspace == "default" ? "sandbox" : terraform.workspace}"
    }

Yes, this is copied code and along with the backend configuration, over 10 lines
of code mostly that is mostly duplicated. However, when I compare it to the
:code:`terragrunt.hcl` files, this is peanuts. I checked a few modules in the
codebase I'm working on and we have :code:`terragrunt.hcl` files that are 100s
of lines long and share all but a few lines.

I found that this convention is easy to document, easy for new developers to
pick up, uses existing tools so you can use your existing knowledge and all of
the benefits of avoiding to use another tool in your workflow.

Workflow
--------

Terragrunt builds a directory for each module (and each environment obviously),
clones the Git repos you mentioned with refs you specified and then mucks about
with the Terraform commands and plan files to stich everything togethere. Even
on paper this doesn't look like a good idea and it isn't one in practice, making
debugging issues difficult.

It also suggests that you can have different versions of the sub-modules in use
across different environments, putting emphais on having the :code:`main` branch
match exactly what is each environment instead of putting emphasis of avoiding
drift between the different environments.

Conclusions
-----------

This post is a critique of the Gruntworks recommended setup and workflow and I
think that if you read it all you would see that I think that there are better
and easier ways. You can compare Terragrunt to a badly managed Terraform project
and find that it helps you. But when you compare to it one that uses the suggested
convention, it makes things more difficult, doesn't deliver on the promise of
keeping your code DRY and promotes bad habbits.

I didn't plan on reviewing Terragrunt until I used it. Terragrunt makes life
less enjoyable. It has a convoluted workflow locally (with those bloody git
clones), it makes debugging issues difficult and the upside is just not there. I
would recommend to anyone who thinks about adopting Terrgrunt to first read the
`workspaces documentation
<https://www.terraform.io/docs/cli/workspaces/index.html>`_ before going with
Terragrunt and think hard on the code review, the testing and development
workflows.