Bots Write Bad Terraform and it’s All Your Fault
We all bear some responsibility for improving the quality of Terraform generated by LLMs. Learn how you can do your bit to help.
LLMs are only as good as their training data. When it comes to Terraform, much of the publicly available code is pretty bad. This results in bots generating poor quality Terraform.
It gets worse. Researchers have warned about model collapse. This is a very real problem with Terraform. LLMs generate bad Terraform, engineers who don’t know any better publish the code, this becomes training data. It’s a compounding loop.
As the old saying goes, “garbage in, garbage out”.
Before we look at how we can generate better Terraform, let’s look at some of the worst examples.
Resource Types in Names and Labels
Naming things is the second hardest problem in computer science. Much of the Terraform I see confirms this.
It doesn’t help that the Terraform documentation is full of bad examples that encourage poor naming practices.
How many times have you seen a variant of resource “aws_iam_role” “my_role”
? This results in references like aws_iam_role.my_role.arn
. Why do we need _role
twice? We know it is a role because it’s there in the type. It adds zero value. Leave it out.
Where there is only ever going to be a single instance of a resource use this
. For IAM roles and policies, use the service name like glue
, or a common abbreviation like sfn
for a Step Function. For everything else, use something that makes it clear what it relates to.
There is a similar issue with naming resources. Let’s stick with the IAM role example:
resource "aws_iam_role" "example" {
name = "example-role"
# ...
}
The ARN for this role will be arn:aws:iam::012345678910:role/example-role
.The role in the name is redundant. It’s already part of the ARN. You don’t need -role
on the end of every entry in the IAM role page in the console.
I generally use <app-name>-<environment>
for name
-ing resources. When it comes to IAM roles and policies I use <app-name>-<environment>-<service>
. Where there are multiple related resources, such as Lambda functions, we need to add an extra qualifier. I use <app-name>-<environment>-<service>-<qualifier>
, where the qualifier would be the function name.
Where resources are likely to be replaced, consider using name_prefix
. This allows you to use create_before_destroy
and minimise downtime.
Comments Everywhere
LLMs love adding comments everywhere. It’s not uncommon for a model to output a comment before each resource like so:
# IAM Role
resource "aws_iam_role" "my_role" {
# ...
}
Wow! Amazing! Thanks for that super useful comment 🙄
Comments should be used sparingly. They shouldn’t duplicate basic information already included in code. Comments should describe the “why”, not the “what”. Your comments should inform the reader.
If you start thinking this file is getting a bit big, “I should add section headings”, stop. Don’t add section headings. Split the resources out into multiple files.
Everything in main.tf
By default many GenAI coding tools will continue to append to main.tf. This leads to poorly organised code, with section headings.
Terraform will happily read all the .tf files in the directory. Take advantage of this. Group related resources into a single file. Keep things organised and well structured.
Keep main.tf
minimal. Use it for your global data sources like data.aws_caller_identity.current
. Split everything else out.versions.tf
for your provider versions. variables.tf
for the variables. outputs.tf
for what the module output
s. Create additional files for different resources. Use s3.tf
for a S3 bucket or network.tf
for security groups and other networking resources.
S3 Bucket Resources
As of the time of writing this post, the current stable release of the AWS provider is 6.13.0. Back in February 2022, version 4 of the AWS provider split out the various S3 configuration into separate resources. The older syntax was deprecated, but is yet to be removed. Three and a half years later LLMs still insist of using the deprecated syntax.
This isn’t an issue of training data cut offs. Version 4 of Anthropic’s Claude Sonnet and Opus models have a cut off date of March 2025. Google’s Gemini models knowledge cut off is January 2025. OpenAI's cut off dates are harder to find, but it appears they are using slightly older training data. Even so, the cut off dates are at least 2 years after the release of the new S3 resources.
Many of the recently published Medium posts and modules on GitHub still use the old S3 resource properties. If you’re creating S3 buckets with Terraform in 2025, you should be using the new targeted resources. The alternative is rewriting that code when a future major release eventually removes the old properties.
jsonencode()ing Policies
Some of the blame for this rests with the Terraform docs. They are full of example inline policies that are jsonencode()
ed. This prevents reuse and leads to cluttered code.
Create policies using aws_iam_policy_document
data sources. Use the policy name for the data source label. Here is an example:
resource "aws_iam_role" "lambda" {
name = "app-env-lambda-example"
assume_role_policy = data.aws_iam_policy_document.lambda_assume.json
# ...
}
data "aws_iam_policy_document" "lambda_assume" {
statement {
actions = [
"sts:AssumeRole"
]
principals {
identifiers = ["lambda.amazonaws.com"]
type = "Service"
}
condition {
test = "StringEquals"
variable = "aws:SourceAccount"
values = [
data.aws_caller_identity.current.account_id
]
}
}
}
data "aws_iam_policy_document" "lambda" {
statement {
# ...
}
}
resource "aws_iam_policy" "lambda" {
name = aws_iam_role.lambda.name
policy = data.aws_iam_policy_document.lambda.json
tags = var.tags
}
resource "aws_iam_role_policy_attachment" "lambda" {
role = aws_iam_role.lambda.name
policy_arn = aws_iam_policy.lambda.arn
}
Swiss Army Knife Modules
Swiss Army knife modules try to do everything. Unfortunately they often deliver less value than just using the resources directly. A sign of this is when the LLM exposes most properties as variables. Often the names have been changed to protect confuse the innocent.
I have written a longer post about why your Terraform module needs an opinion. I won’t repeat that here. Build modules that have opinions.
The Fix
How do we prevent LLMs generating bad Terraform? First we stop publishing bad Terraform. We need to improve the quality of the training data the bots consume.
Have someone competent review your Terraform. No GitHub Copilot reviews don’t count. It doesn’t do a great job of reviewing Terraform. I suspect this is because it is using the same bad training dataset that generated the bad Terraform.
The low hanging fruit should be caught by a linter. I started the “Dave says” TFLint ruleset to enforce some of the basics. The rest needs a human. Microtica’s Infrastructure Code Review Guide provides a good starting point for the types of questions you should be asking during a peer review.
Back in 2022, when we were all hand crafting artisanal Terraform, we planned our implementation, then bashed it out. It should be no different with a LLM. Plan then build.
Using a vague prompt and hoping the bot can nail it first time rarely ends well. Prepare a detailed prompt with what you want. Personally I find Claude is great for this. Use the bot to refine your plan, then ask for a prompt to generate it. Make it clear the LLM isn’t to generate code during this session.
If you don’t already have documented standards for your Terraform, you have a new task for today, write up your standards. Keep it simple. A markdown file is better than Confluence (or a MS Word doc 🤮). Expect this to evolve.
Each coding assistant has its own system for handling guidance documents. AGENTS.md is gaining tractions, so maybe one day we will have a common standard. Regardless of the mechanism, share your standards with the bot. This gist contains the guidance document I use. It covers the items in this post and more. Take it and adapt it for your team.
We all bear some responsibility for improving the quality of Terraform generated by LLMs. Please do your bit to help. Thank you! 🌊
Need Help?
Do you need some help implementing the ideas in this post? Get in touch! I am happy to help.
Like and Subscribe
Did you like this post? Please subscribe so you don't miss the next one. Do you know someone who would benefit from this article? Please share it with them.
Proactive Ops is produced on the unceeded territory of the Ngunnawal people. We acknowledge the Traditional Owners and pay respect to Elders past and present.