Terraform is not the golden hammer

scrollaway · on Sept 19, 2021

It's very difficult to communicate what Terraform's strengths and weaknesses are to someone who's never used it or IAC in general.

Spend enough time playing with it and understanding it, you'll end up like me thinking about all the shit you configure left and right such as hooking up Stripe's secret keys, Google Analytics and the webmaster console, and just about everything else we configure via web interfaces, and you'll think:

Why can't we use Terraform for this as well? Manage these SaaS products the same way we manage the rest of our cloud, test and audit changes, automatically roll secrets and update anything that needs updating the moment you change a setting.

Ah well. Not enough APIs out there. And it's difficult to write and maintain terraform plugins for these throwaway cases especially if they are going to use private APIs. Anyone know if Pulumi plugins are easier to write?

vasco · on Sept 19, 2021

I wish I could use it for everything as well. Every other tool the business depends on just scares me to shit that it's config isn't in code and we can't have a proper backup of it. At least Datadog supports terraform for most things and not only we manage all our infra through terraform we manage all our monitoring with it too. I doubt very much I'll ever go back to non-monitoring-as-code if that's even a term.

Infrastructure, all the monitoring as well as all the on-call rotation configurations (and anything else that is in that loop) should all be in code, and all changes should be reviewed the same way as application code does. If it doesn't, you can't really trust you're gonna be alerted properly when things start breaking.

I wish I could use it for personal things too, I'd rather have my bank account settings, my government tax information, yada yada in a personal terraform repository for example. Change of address? Commit a a change, check if the plan is good and apply to change it everywhere. Though having lots of experience with Terraform I can only imagine what the equivalent of trying to delete an S3 bucket that still has data in it is for a bank account.

OJFord · on Sept 19, 2021

I so agree; I tried/am trying to write an Android provider - currently just have app (un)installation working, and not very well, I expected settings management to be the hard part, but egh.

Why can't everything have nice public APIs! And, while I'm at it, some sort of all-encompassing ticketing system, hell even if it were Jira. 'Pothole', assign local council. Blocked on '2021 roadworks funding increase', backlogged. Assigned to councillor. Won't fix. Ok - maybe I'm not making it sound great, but at least you could see some reasoning, and what the blockers are. Follow the chain to work out that 'communal lobby needs repainting' hasn't been acted kn by building management company because, ultimately, of global supply chain disruption and the contractor's supplier's supplier can't get any paint ingredients.

crabmusket · on Sept 19, 2021

I've used https://github.com/Mastercard/terraform-provider-restapi successfully with a cloud provider which provides a suitable HTTP API. There was a bit of fiddling with JSON formatting and their API docs, but it wasn't too hard all in all.

But like you say - now I've done that, I want to do it for every UI that I'm forced to log in to!

weitzj · on Sept 19, 2021

Yes. We use the Rest Api Provider extensively to provision ElasticSearch and Kibana.

scrollaway · on Sept 19, 2021

Wow, neat, thanks for the link. Maintained by Mastercard, eh? Anyone else on HN used this or worked on it?

MuffinFlavored · on Sept 19, 2021

what’s the backing database for this, the .tfstste file? do the resources/secrets you create end up getting “backed up” (committed) to git?

crabmusket · on Sept 20, 2021

That's right, which is why they usually recommend not committing tfstate but using some remote backend.

We've gotten away without that because the APIs we use don't contain any secrets. (The auth token for making the requests is just an environment variable.)

mason55 · on Sept 19, 2021

This is how I feel about any kind of configuration after moving all my personal systems to NixOS.

“What do you mean run an installer and update these files…”

brightball · on Sept 19, 2021

IMO, this is an area where I think Terraform + Ansible pairs so well together.

If there’s ever a gap in what Terraform offers you can pretty easily fill it with Ansible.

weitzj · on Sept 19, 2021

And this setup is also encouraged by HashiCorp (at least I saw a talk by them). Use ansible for your “smart” sequential executions and Terraform as a sane wrapper for state.

Octabrain · on Sept 19, 2021

I’ve seen (and fixed) so many ugly messes at this point made as a result of mixing and wrapping tools with different purposes together like Ansible + Terraform that it’s something I strongly discourage. I recommend to keep the boundaries and responsibilities of the tools clear. In this case, Terraform for the creation of resources and Ansible for the configuration of those resources. In my opinion, this gives as a result a much simpler and maintainable ecosystem in the long run.

chucky_z · on Sept 19, 2021

I’ve found running Terraform via Ansible to be a pretty good experience.

weitzj · on Sept 19, 2021

So the other way around. Do you employ a GitOps approach this way?

I find it hard to figure out how to use GitOps with Ansible? How do you make a PullRequest which indicates that something should get deleted? You still would have to keep around an ansible playbook for the stuff you want to delete.

pram · on Sept 19, 2021

Ansible is also amazing with Packer as the provisioner.

satya71 · on Sept 19, 2021

Pulumi has dynamic providers. You define the crud operations and Pulumi manage the state.

reilly3000 · on Sept 19, 2021

After a little grokking it’s a surprisingly easy way to manage arbitrary API resources. They are just functions in your language of choice and can accept parameters from any dependency. It’s also possible to roll your own provider, but dynamic providers cover all kinds of use cases.

kall · on Sept 19, 2021

Your intuition is right on pulumi. Creating these kinds of extensions is minimal effort. You can start out by adding a "dynamic resource" class to your infra codebase and extract it into a plugin later, or not.

These are not the same as "real" pulumi providers that run across all supported programming languages, but I think they are a good enough fit for the cases you mention.

snom380 · on Sept 19, 2021

> On terraform it's different, because of the tfstate. All the deployed elements are stored in the tfstate, re-running terraform won't update resources that are supposed to be in a specific state but are not.

This is incorrect, and makes me wonder how the author has used terraform. Terraform will certainly detect differences between managed and current state for the resources it manages whenever you do a plan/apply.

The major challenge is that terraform _can only reconcile resources or configuration values it knows about_, and that depends very much on how a particular cloud vendor or terraform provider has modelled resources. I believe the Helm provider is one example where it (at least in the past) haven't had a good way to reconcile state.

dead10ck · on Sept 19, 2021

This stuck out to me too. Terraform absolutely does check the current reality of the state and applies changes to do what the HCL tells is to. They are either using terraform in a really weird way, or this article was written by someone that doesn't actually run terraform themselves.

jrochkind1 · on Sept 19, 2021

Phew! I thought I was misunderstanding terraform when I saw that.

Perhaps they are working with certain poorly implemented or buggy providers, and not realizing those providers were doing something different than terraform's properly working behavior that most providers implemented.

Bugs happen, but the first step is agreeing on intended behavior so we know what's a bug!

shadycuz · on Sept 19, 2021

I also agree, but I don't have experience with the providers he is using.

When working with AWS it will always reapply the terraform configuration.

A great use of this is for account hardening. You can run it daily to make sure it's configured correctly.

OJFord · on Sept 19, 2021

It would be even better if you could somehow tell it it should have control over the entire account, so that anything (entire resources I mean, not just changed properties) created outside terraform would be destroyed.

In terms of API use though I suppose that'd be quite expensive to plan - listing every possible AWS resource (in every region!) for example.

joombaga · on Sept 19, 2021

You'd have to go per-region first to avoid a major redesign on the provider (the API is also per-region). I can see something like a `controlled_resource_types` list attribute on the provider that you could set to e.g. `[aws_instance]` to inform the provider it needs to compare the list of resources of the specified type to the state.

OJFord · on Sept 20, 2021

True. That'd be fine though, if you wanted to be really thorough you could just specify a provider for every region and only actually 'use' the ones you wanted, the others just serving as a dummy to enforce no resources.

sciurus · on Sept 19, 2021

Relatedly, I'm pretty sure the earlier statement

> For some resources like RDS or EKS, it won't check if the resource already exists or not. So if it's missing, nothing is going to happen as it's marked are deployed in the tfstate file

is also wrong.

joombaga · on Sept 19, 2021

It is, at least for most RDS and EKS resources (RDS cluster, cluster instance, parameter group, EKS cluster)

raffraffraff · on Sept 19, 2021

There's also brokenness around terraform for_each and providers. If you have a module that creates a Kubernetes cluster and then applies a helm chart to it, you can't convert it to one that takes a bunch of cluster definitions that it iterates over using for_each. Basically, there is no way to do this in a date driven way. Sucks.

robertlagrant · on Sept 19, 2021

The author also seems to think DSL means "descriptive languages" and that Helm "even" supports kubernetes, when in fact it's only for that technology.

forty · on Sept 19, 2021

The stage when terraform read the reality to compare it to the current state is called "refresh". It can optionally be skipped.

kleinsch · on Sept 19, 2021

Surprising that the author is writing a tool for managing your servers, so writes a post about how Terraform isn't great at managing your servers...

It seems like the root of the problem is that the author wants to use Terraform to manage their AWS state, but also wants to use the web console to directly change things, so Terraform gets out of sync. Terraform has a command to handle this - https://www.terraform.io/docs/cli/commands/refresh.html

throwaway894345 · on Sept 19, 2021

As the sibling commenter notes, Terraform has a refresh flag, but I wonder if Kubernetes’ model is better here. Rather than a one-off process that tries to update everything, Kubernetes has many small controllers which are essentially processes on the cluster that just run a control loop. Each controller corresponds to one resource type, so it will just loop over incoming events that pertain to the resource type in question and attempt to reconcile the state of the target resource instance with the desired state. If it fails initially, it will retry with back off. If something doesn’t stabilize after several minutes, an alert can notify a human.

The key differences between the controller approach and the IaC approach are, I think, lots of little processes continuously reconciling state for all resources of a given type (many small loops that touch all resources of a given type on the entire cluster) versus a one-off process that tries to touch just the resources it cares about and if it fails it just gives up.

One thing Kubernetes definitely improves upon Terraform is that Kubernetes uses a YAML “assembly language” for its infra as code, but that YAML could be generated by a real programming language. Terraform expects you to write HCL, which is an accidentally re-invented programming language (every IaC tool provider thought static configs like YAML would suffice as a human interface, but as they gradually realized the need for more dynamism, Terraform and others would bolt on one dynamic feature after another until they had a slow, unfamiliar, and counterintuitive programming language). Terraform has a CDK that allows writing in other languages, but I’m skeptical that it liberated you from Terraform’s model of the world (e.g., if I rename a variable in CDK, does it try to destroy and recreate the underlying resource as with Terraform?). I’m also concerned that rather than allowing us to generate YAML in the obvious way, it will require bizarre inheritance patterns like the AWS CDK. I would be curious to hear from folks who have used the CDK.

ClumsyPilot · on Sept 19, 2021

"every IaC tool provider thought static configs like YAML would suffice as a human interface, but as they gradually realized the need for more dynamism, Terraform and others would bolt on one dynamic feature after another until they had a slow, unfamiliar, and counterintuitive programming language"

I constantly see soo many people step on the same rake, it's incredible. Tools like Tilt let you use python, it's a much more sensible approach.

wernerb · on Sept 19, 2021

The reasoning you put in also means kubernetes is unsuited to be controlled by terraform. Too many lifecycles (resources) to centrally control. kubernetes custom resources can have dependencies on others which terraform either needs to support as well. Which is not doable to maintain.. keep your kubernetes manifests outside or your terraform state.

throwaway894345 · on Sept 19, 2021

My company manages a lot of Kubernetes manifests with Terraform without issue. Terraform is just generating the manifests in this case; Kubernetes is doing the reconciliation work. More complex than is ideal (i.e., if we were starting out with Kubernetes we probably wouldn’t use Terraform) but it works reasonably well.

leg100 · on Sept 19, 2021

The Hashicorp co-founders considered alternative approaches when originally designing terraform. The actor model was considered but dismissed. That's not a million miles away from the kubernetes reconcile loop model:

> Then we transitioned to an actor-based model where each resource was almost an actor, and there was a message-passing interface between them.

> This allowed the system to be highly concurrent the way Terraform is today, but also confusing for users to deal with and very difficult to build a programming model around, because the ordering of execution was so random and everything was happening concurrently.

https://www.hashicorp.com/resources/terraform-fireside-chat-...

They may still be right. Kubernetes' approach may seem more attractive but terraform is far more pragmatic in its design.

steveb · on Sept 19, 2021

There are a lot of developments around using Kubernetes as an IaC platform for the reasons in your comment. The combination of a standard API model in CRDs + the controller model maps nicely to managing infrastructure and exposing resources to developers.

<https://crossplane.io> just graduated to CNCF Incubation and each of the cloud providers are working on K8s controllers and code generators (like Amazon Controllers for Kubernetes, Google Config Connector, and the Azure service operator).

verdverm · on Sept 19, 2021

Terraform accepts JSON format as an alternative to HCL.

I prefer CUE to JSON for TF and many other tools now

throwaway894345 · on Sept 19, 2021

To be clear, the issue isn’t the HCL syntax. You could similarly use Cue to generate HCL. The problem is using Terraform’s dynamic features which were poorly designed.

the_duke · on Sept 19, 2021

Terraform is also working on allowing actual code: https://github.com/hashicorp/terraform-cdk

throwaway894345 · on Sept 19, 2021

I know, I mention that in my comment :)

orf · on Sept 19, 2021

There’s also a refresh flag on plan. It’s always worth using before your apply CI step.

jdub · on Sept 19, 2021

My biggest issue with Terraform is the impedance mismatch between HCL and the rest of the known universe.

When you write a provider, you spend half your time converting data structures from the HCL submitted to your provider into the JSON your target service inevitably expects, then you spend half your time converting the JSON your target service inevitably returns into HCL for Terraform to consume, and then you spend another half of your time fixing bugs and polishing.

It's okay when you're building simple providers, but anything reasonably complex becomes unwieldy. I had a go at building some providers for AWS services that were not supported by Terraform or CloudFormation... and I just retreated to cheesy Lambda custom resources for CloudFormation.

kubanczyk · on Sept 19, 2021

I read you. The half of the Terraform's value today is "let's have common aesthetics - especially the examples and snake_case_naming_convention - and convert any REST API in the world to that".

The typical example is to start from there:

https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_V...

and arrive here:

https://registry.terraform.io/providers/hashicorp/aws/latest...

If you review it carefully, it is apparent how much coding effort and many moving parts were used to perform a transformation which seems disproportionately primitive.

gtirloni · on Sept 19, 2021

The upside though is that you're adapting that known universe to a way of working that makes sense for terraform users.

throwaway894345 · on Sept 19, 2021

Unfortunately HCL is also an unnecessary learning curve for users. I still don’t have my head around a list versus a bunch of blocks with the same name, for example. I originally thought it was syntax sugar for a single mechanism, but I’ve had errors for trying to use one instead of the other before.

jdub · on Sept 19, 2021

But I wanted to be a Terraform user. So an alternative interpretation is that, as designed, Terraform is slowing down adaptation of the known universe for Terraform users.

ClumsyPilot · on Sept 19, 2021

Good tools are fit for the world we live in, bad tools require you to "adapt the the known universe" to the tool

reilly3000 · on Sept 19, 2021

I’m partial to Pulumi for this reason. It allows devs to use familiar languages to define infrastructure with their familiar tools, write tests, and even interop with existing terraform.

mvanaltvorst · on Sept 19, 2021

I wish Terraform were less opinionated. It has a very clear set of rules you have to adhere to, and if you try to do anything remotely complex you will encounter barriers left and right.

An example is the fact that `for_each` is not supported on providers [1], an issue with 230 likes which has not been solved since January 2019. This had me resort to a Python script which generates a `.tf.json` file, definitely not ideal. Infrastructure as code sounds great, but in practice it's closer to "infrastructure as a non-standard markup language".

[1]: https://github.com/hashicorp/terraform/issues/19932

throwaway894345 · on Sept 19, 2021

You have to understand that when IaC was new, the marketing was “it’s so simple you can just write YAML/JSON/etc” because frankly the industry was too dumb to understand that “using a real programming language to generate a description of the desired resource state” and “using a real programming language to imperatively reconcile the current and desired states oneself” are different things. So Terraform began with something that resembled YAML in its static-ness, and over time, more power was required so they would bolt on a dynamic feature but were reluctant to give the impression that they were building a programming language so the feature would be as obscure as possible. But that wouldn’t be enough either so they would add still more dynamic features, each comparably obscure until in time they’d built a complete, obscure programming language.

But this wasn’t just Terraform! The entire industry did this too. CloudFormation began as simple JSON, but over time they allowed you to encode the abstract syntax tree of a shitty programming language in your YAML, and CloudFormation would interpret it. However stupid that may sound, in the Kubernetes world, we have Helm which lets you generate YAML with text templates which is honestly the dumbest idea in the world (imagine a compiler that generates syntactically invalid machine code if the input program has an extra white space character).

Of course in all of these cases the answer is staring us in the face: use a static language (YAML, JSON, etc) to describe the desired state, and use a higher level language (like Python or Starlark or Dhall or etc) to generate that static desired state description. The only thing Terraform (or any IaC tool) should care about is the YAML description. That it is generated from Starlark or TypeScript is just an implementation detail.

Instead of that, though, we get CDKs which are so close, but admittedly I haven’t used them in anger yet.

x3n0ph3n3 · on Sept 19, 2021

One of the best parts of CloudFormation was their introduction of Macros. You can take either your whole template or just a snippet, and perform dynamic transformations by calling a lambda. I'll admit I've gone so far as being able to embed ERB (Ruby) into my templates in order to more dynamically define some resources based on stack parameters. I can also create N resources with common configuration based on the values of a CommaDelimitedList.

throwaway894345 · on Sept 19, 2021

I think the idea here is that macros are neat in any language, but in CloudFormation they can help automate stuff that is only difficult because of CloudFormation, and the macros themselves are harder to use than those in a normal programming language. In all cases, I think it’s strictly less nice than generating your CloudFormation YAML with Python or similar.

x3n0ph3n3 · on Sept 20, 2021

The problem of generating your own YAML is you end up having to maintain multiple copies of nearly identical templates and keeping them in S3. I have done a bit of that and maintaining those as build-time assets rather than run-time assets is less appealing. Granted, that's required whenever you get into the realm of dynamically-determined sets of parameters.

throwaway894345 · on Sept 20, 2021

Yes, this is another problem with cloudformation, although IIRC I just used “aws cloudformation deploy” and let it manage my s3 resources.

kevincox · on Sept 19, 2021

I think this is less that it is opinionated but more that the HCL evaluation feels like a pile of hacks. There are unclear rules on what can be evaluated when and what dependencies are possible. Part of this is so that `plan` can work as it does, but it seems like there are just major gaps in general. For example providers can't depend on resources. This makes it very difficult to for example set up EKS then use the kubernetes provider to manage the resources in the cluster. The solution is obviously separate stacks but that brings in a whole bunch of other problems.

I think Terraform is quite possibly the best tool available, but there are clear flaws with both the model and the implementation. I think if I were to make a Terraform v2 I would make `plan` completely pure. This would avoid the provider issues, make validation and testing in CI easier and a whole bunch of other benefits. Of course there are downsides. For example EC2 instance IDs are random so you can't just include them in your pure plan. You would need some type of placeholder that is used for evaluation. This does cause some issues as it limits the operations that you can do with that value (so you can't pick the instance size based on the random instance ID) but overall I don't think it would be a major issue if the final substitution was handled well by the framework.

cube2222 · on Sept 19, 2021

Terraform definitely has it's warts, though, as other commenters wrote, not everything in the article is true (the reconciliation part): dependency resolution blows up in time as your number of resources grows, so you need to split up your statefiles; it can't passively listen for drift happening in a dataflow-like way (that would be awesome); it's not transactional like CloudFormation (which is more of a tradeoff, than a cons), and more.

It is however a great improvement over the previous ways of doing things, and probably the best out of the current similar alternatives out there (you might mention Pulumi as a strong contender, especially for AWS glue writing).

And - though as per the disclaimer, I may be biased - until a better tool comes up, I'd advise looking for specialized IaC CI/CD tools to ease your path with Terraform, like Spacelift[0].

It can help you with orchestrating dependencies among multiple state files; take care of scheduling regular drift detection/reconciliation without going into your way and locking your state; gives you a policy system for making sure preventable mistakes don't happen (i.e. recreating a resource you definitely never want to recreate); manages your credentials depending on whether you just want to run a plan, or apply your changes, and much more.

I can't imagine doing Terraform again without a tool like it.

Disclaimer: Software Engineer at Spacelift. If you want me to expand on the "and much more" part, you can find a demo-scheduling link in my bio!

[0]: https://spacelift.io

3np · on Sept 19, 2021

The author seems to have some misunderstandings on how Terraform is supposed to work - you should get the "automatic reconciliation" they're saying is missing. Also,

> I run once again the “terraform apply” command. But for some reason, Cloudflare API doesn't answer and I got completely stuck there without the possibility to update with Terraform this field because of linked dependencies.

You should be able to circumvent this with a `-target`.

That being said, I know exactly what they're talking about with helm. IME the helm provider was/is a complete mess and gets inconsistent state a lot. Helm specifically I would also keep out of TF until that is fixed, if ever. I haven't had that happen with other providers, though. Perhaps OP was just really unlucky ending up with the odd half-broken AWS module.

dharmab · on Sept 19, 2021

An alternative opinion: When I worked at a large tech company, my team made a conscious decision to not use terraform. This gave us some key advantages- we are able to adopt new cloud features immediately, months before they were available in tf, and our direct cloud access let us build features that would surprise the teams using tf within the company.

If your core competency isn't dependent on your cloud platform tf is a great tool. But using cloud APIs directly was great for us.

jrsdav · on Sept 19, 2021

> But using cloud APIs directly was great for us

This is fine, I've done it extensively myself for some of the bleeding-edge cloud stuff, but the importance of things like tracking state, managing hierarchical resource dependencies, or retry/back-off logic shouldn't be tossed aside simply because there are gaps in what's available in the Terraform providers. Especially where change management is important (basically any enterprise company).

I'd caution others reading this against abandoning something altogether and writing bespoke IaC tooling simply because the stable approach doesn't cover every (bleeding) edge case.

You'll spend a lot of time reinventing the wheel, and while it's fine for certain situations (like when you only care about desired state, not known state, for instance), you'll move faster (and likely safer) by sticking with tools like Terraform for the bulk of your infra, and augmenting here there with cloud APIs/SDKs when needed.

dharmab · on Sept 19, 2021

Yes, we did have to implement our own state tracking, retries/recovery, etc- but since we were focused on a limited subset of the cloud API, this was pretty easy.

Thaxll · on Sept 19, 2021

So you re-implemented terraform but worse most likely. Also you could have added those missing features and re-use the TF engine, it's very simple to include new API of an existing provider.

dharmab · on Sept 20, 2021

No, we used an entirely different architecture/paradigm not possible with tf, and had capabilities tf doesn't attempt to provide (such as coordinating migrations with both cloud and application APIs, or managing capacity while upgrading 10000s of CPU cores worth of compute).

oneplane · on Sept 19, 2021

Would the time spent re-implementing a specialised Terraform subset be better spent simply maintaining a private branch of the AWS Provider? You can add your secret/special API without having to do all the other heavy lifting as well.

This makes your own effort for customisation minimal, keeps your knowledge portable and because your added features can be separated in to different files and the provider API is stable you can also easily backport/fast-forward new changes.

dharmab · on Sept 20, 2021

Our custom code was fewer LOC, more robust and faster than tf. Honestly, it's not hard to beat OSS if you have a team of great engineers. This was far from the only time our internal projects were better than OSS- it gave us advantages for removing bottlenecks in the critical paths.

oneplane · on Sept 20, 2021

So when engineers left the company, was their knowledge portable? And attracting new talent, did they come with X years knowledge for the custom code?

With fewer LOC I'm not sure what you mean, the provider code is pretty small, smaller than custom ansible, salt or puppet modules. Smaller than CDK and Pulumni as well. Sure, you'll have to write Go, but that's about the only hurdle.

Everyone doing a round of NIH for internal tooling is ultimately not making the tide rise.

Edit: don't get me wrong here, writing internal tools to do a job the right way for the right needs isn't "invalid" or something like that, but people often dismiss the rest of the lifecycle of knowledge and maintenance when making something completely custom.

iddqd · on Sept 19, 2021

It usually takes 1-2 years for AWS to roll out their latest updates to the regions I use and by then Terraform is stable.

goodpoint · on Sept 19, 2021

Same here. It's incredible how much efforts developers are willing to put on popular "devops" tools when the job could be done faster with 200 lines of Python.

dharmab · on Sept 20, 2021

This comment Gets It.

CSDude · on Sept 19, 2021

What new cloud features was not available in Terraform for months?

dharmab · on Sept 19, 2021

Private preview features for partnered organizations. We had access up to 6 months before the public.

pvtmert · on Sept 19, 2021

eg. AWS ChatBot not available in TF yet. TBH AWS haven't even added it to their Go SDK. So, I cannot blame TF. But anyway that's one of the inheret problems of TF plugin system.

Compare to kubectl. Where you can write plugins in bash/shell and mark with execute bit, put it in somewhere in your $PATH as kubectl-blabla and use it as "kubectl blabla".

CSDude · on Sept 19, 2021

It's not fair to compare imperative simple shell scripts with the things Terraform does. It has schema validation, state comparison, retries, failure handlers etc.

Also, just as you can write extensions to kubectl, you can write your own provider in Terraform if it does not exists. See https://registry.terraform.io/modules/waveaccounting/chatbot...

Also, Chatbot does not have a public API, that's why, it's only configured via Cloudformation. So the expectation is not fair either.

I've seen Cloudformation getting features years later. i.e

2021 - https://aws.amazon.com/about-aws/whats-new/2021/05/amazon-dy... 2015 - https://aws.amazon.com/about-aws/whats-new/2015/07/amazon-dy...

jen20 · on Sept 19, 2021

NAT Gateways is another notable feature that took CloudFormation months yet Terraform had on day 1.

If you can configure something via CloudFormation you can integrate it via Terraform et al also, since they have resources representing CloudFormation stacks.

lincler · on Sept 19, 2021

This! Is not like you can't go beyond what Terraform offers by default. Running CloudFormation stacks from Terraform is a neat way of solving missing apis/integration. And that's exactly what my team did when Terraform was missing a lot of lambdas functionalities. We just declared the CloudFormation Stack for lambdas and then call it from Terraform.

jen20 · on Sept 19, 2021

There’s no reason you can’t do something similar with Terraform either - plugins speak GRPC and thus could be implemented in Python, with Node.js or with Rust.

However, if AWS have not published metadata for a given service to be used across their various SDKs, it’s hard to take that service particularly seriously, so I’m not sure I’d bother with this.

pojzon · on Sept 19, 2021

I don’t have issue with terraform as it has a very clear defined usecases.

I have issues with all the providers that make no sense like application configuration providers or all the flavors of kubectl providers..

Those are often very low quality and have various issues dedicated solutions don’t have.

An example could be helm and helm_provider. The former just works, with the latter Im constantly running into weird bugs that break terraform state..

nrvn · on Sept 19, 2021

One of the biggest reasons that have kept me away from terraform apart from the esoteric language is that terraform modules are always a few steps being from the upstream public cloud offerings.

In the sense that whenever there’s a new API or service available in any of public clouds and their official SDKs there will always be a delay before this new service/feature/API will become available in terraform.

First time I encountered it with GKE private clusters 3 or 4 years ago. Now it is AWS Keyspaces.

The second biggest reason is whenever you have a requirement for a hybrid or multicloud then well you are left with rigidity if HCL. It is probably doable but for what sake?

Solution: get a real language, write a STATELESS configuration management(IaC) system for your own needs and maintain it. The majority of public and private cloud providers ship SDKs in most popular languages that will help you build your own software solution and reduce your dependence on a third party which I would put under progressing operational risk category. Yaml/json/cue/toml for end user configs would suffice.

Example: for one of my previous projects were built a tool for a hybrid AWS-openstack setup, and were managing a dozen of busy environments.

xyzzy_plugh · on Sept 19, 2021

This is my preference as well. I've done everything from makefiles and bash scripts to a monolith Go program that statelessly provisions/tears down resources.

Even makefiles are pretty straightforward, though you really want operations to only trigger when checksums differ -- timestamps result in a lot of redundant operations. As long as everything is idempotent, it's pretty straightforward.

oceanplexian · on Sept 19, 2021

> Terraform modules are always a few steps being from the upstream public cloud offerings.

My experience has been the exact opposite. Usually Terraform offers support for cloud services long before the vendor provides an SDK or supports it with their own offering (e.g. Cloudformation). There are still dozens of AWS services, for example that have no CF support offered by AWS.

nrvn · on Sept 19, 2021

The emphasis on stateless here is that your desired state us described in code that resides in your repository. Actual state is what you have in your cloud. No need to spend time on state format, storage and related logic and complexity

deimosfr · on Sept 20, 2021

I mostly agree with this, but you're not considering the time it takes to make and maintain it over time. It is non-negligible and has to be in the balance as well. But from a pure technical POV, I agree.

digianarchist · on Sept 19, 2021

You can run terraform apply against a particular resource which will only provision that resource and its dependencies.

tfstate files can be painful to manage, we had a lot of trouble with them at Capital One but mostly because:

1. People would modify state outside of TF which you should avoid.

2. People didn't architect their apps well which led to long lived infra. TF works best with cattle like infra.

terraform import feels very much like an afterthought which is why projects like terraformer exist.

phendrenad2 · on Sept 19, 2021

Terraform basically gives you what cloud providers should have. AWS/Azure are these overcomplicated web interfaces, or undocumented REST APIs, and Terraform gives you a simpler way to configure stuff.

gattacamovie · on Sept 20, 2021

If you have kuternetes in the stack, I consider CNCF's https://www.crossplane.io the best path.

For the cases where the native integration with the cloud provider does not provide the exact parameters you look for, it provides the alternative to integrate with terraform in the background, while making it transparrent to you.

0xbadcafebee · on Sept 19, 2021

Terraform is actually kind of a nightmare. It's deceptively simple yet requires a massive amount of real world expertise to use it properly. It's a configuration management tools, but more difficult to use and extend.

I'm thinking of designing a series of tools to replace Terraform. The idea would be to break down how modern cloud environments are managed into a couple concepts, and then make a variety of tools that work within those concepts together, so that it's easy to expand and modify the way you use them for your use case. This would enable things like tailoring the use of the tools to a particular deployment strategy, or adding custom business logic, or replacing individual functionality, without being tied to one tool, language, etc.

jrochkind1 · on Sept 19, 2021

> On terraform it's different, because of the tfstate. All the deployed elements are stored in the tfstate, re-running terraform won't update resources that are supposed to be in a specific state but are not.

Huh, is that true?

I'm just getting started with terraform but I assumed that was the idea of terraform (where it didn't happen would be a bug), and I think I had seen it happening for the few basic resources I have started out with (S3, cloudfront).

If the state doesn't match the actual configuration of S3, terraform notices, and the plan is to make it so. No? Am I confused and it hasn't been doing this?

Or is this is inconsistent, true of some resources and not of others? That seems surprising. What's the idea?

blown_gasket · on Sept 19, 2021

If Terraform was used for the deployment of the infrastructure, then state IS the actual configuration of the system.

All that a plan does is evaluate what is going to change in the current Terraform state by performing a dry-run of the Terraform code that you have supplied.

If you would actually like to make changes to the Terraform state based on what the Terraform code evaluated then you run a Terraform apply - which will, for the resources deployed via Terraform, update the configurations themselves and update the Terraform state by using the Terraform code as the instruction set.

You can actually see this in action with plan and apply as the output will show you +,-, and ~ where ~ is settings that are going to change but are not new configurations or configuration to be removed.

Edit: Learned from some other comments that Terraform has a 'refresh' command that will take deploy+n time configurations done outside of Terraform and sync those configurations with the state. This might be what you ideally are looking for after deployments?

jrochkind1 · on Sept 19, 2021

Right. I guess I'm asking about what happens if state changed outside of terraform.

I thought I had seen terraform correcting it (to match what terraform thinks it should be) in some cases.

OP seems to suggest that in some cases it does and in other cases it doens't. I am surprised if that is inconsistent and unpredictable, and would have expected terraform to (modulo bugs) either always or never do that. And am wondering what terraform's intent is with that.

blown_gasket · on Sept 19, 2021

Made an edit to my original comment but it may help to be here also.

'terraform refresh' may be what you are looking for. This will update the state to match current configurations that may have been done outside of Terraform.

snom380 · on Sept 19, 2021

From https://www.terraform.io/docs/cli/commands/refresh.html :

> You shouldn't typically need to use this command, because Terraform automatically performs the same refreshing actions as a part of creating a plan in both the terraform plan and terraform apply commands.

jrochkind1 · on Sept 19, 2021

I'm talking about fixing the external actual configuration that has diverged, to match what terraform config wants it to be.

However, snom380 says this is what terraform is intended to do, and does with properly implemented providers, which makes sense to me.

I'm not sure if you are talking about the same things. We -- me, snom380, and the original quote I made from OP -- are talking about what happens when external actually existing live resources have diverged from terraform's state. I understand what you said that under a "perfect" situation, this would not happen. But it does sometimes for various reasons, what the original quote from OP is talking about and what I'm talking about is what happens when it does. I think maybe you are talking about something else.

snom380 · on Sept 19, 2021

It's not true, and what you describe is correct.

I can only assume that the author has used some provider that hasn't implemented this properly (helm, I believe, is one example), or that they've run into one of the cases where terraform treats configuration/attachments to a resource as a separate resource (e.g. IAM role vs attached/inline IAM policies).

danielovichdk · on Sept 19, 2021

I run only on Azure. Nearly all my IAC is written in PowerShell uilizing the Azure Cli.

Terraform and yaml as well are so verbose and you have no clue whats going on from the local side of things.

How do debug your terraform markup locally. My guess is you can't.

awslattery · on Sept 19, 2021

The console command [1], mixed with local variables, is quite handy for debugging locally.

1. https://www.terraform.io/docs/cli/commands/console.html

x86_64Ubuntu · on Sept 19, 2021

Seems as if people have very strong feelings about Terraform, but little actual experience using it.

httgp · on Sept 20, 2021

Bingo. While I see some valid complaints about Terraform in the comments, most of them seem to come from a place of not fully understanding how it works or the features it offers.

toyg · on Sept 19, 2021

It's not the easiest tool to grok, writing configuration files can be very verbose, and its opinions about a number of things can be very off-putting. I tried to "get into" TF many times and never actually fell in love. I like the idea, I like the cross-cloud approach, I just don't particularly like what the experience ends up being. This shit should be easier.

gjhr · on Sept 19, 2021

> When you run Terraform against AWS on the subnets part, it will create (anytime you deploy) the missing subnets

That is one of the core features of Terraform? Detecting and fixing drift is useful.

poisonta · on Sept 19, 2021

I had mostly similar feelings four years ago. As I understood the reasons behind them, I started respecting more the people at HashiCorp. They are really smart.

airocker · on Sept 19, 2021

We use this exact stack, but we generally would not rely on tfstate. We will remove everything and regenerate it. It’s not an operation done too frequently that we have a big problem. Also, helm we use as a separate layer that is applied after terraform and can be repeated many times. This changes often.

wayoutthere · on Sept 19, 2021

The new use case for Terraform is IT departments using it as an IaC policy control system for self-service. Rather than push teams through a web interface, you just expose Terraform via Terragrunt to the dev teams and run their files through a policy-driven linter before executing. And make it so that Terraform is the only way you can push to prod.

I think people get into trouble with Terraform when they try to use it to do more than provision infrastructure. Things that should probably be part of build scripts, CI/CD pipelines or config management. Terraform isn’t good at those things; but it is very good at provisioning cloud infra in a cloud-agnostic fashion.

pvtmert · on Sept 19, 2021

I agree most of your comment but the cloud-agnostic part.

Heck, I never get how terraform can be cloud-agnostic... If everybody thinks having a same language (HCL) is equalivent to cloud agnostic, YAML exists...

It is literrally impossible to create a simple VM in 2 different cloud providers without defining them twice with their own specific parameters.

If you use AWS provider, resources start with aws_, if it's GCP it starts with gcp_ and so on. It is not possible to have a "resource vm { name = ... provider = aws }"

jen20 · on Sept 19, 2021

Terraform provides the same _workflow_ across clouds, not the same resource model (which would be dumb, since it would necessarily provide only a lowest common denominator representation).

> It is not possible to have a "resource vm { name = ... provider = aws }"

It actually is via modules. It’s a lot of work with basically no benefit though, so in practice people don’t do this.

Pulumi is better at specifically this kind of thing however, since you can implement a common interface which can be specialised for each available cloud.

d0gsg0w00f · on Sept 19, 2021

IME, Terraform is great for "fire and forget" deployments. However, if you're trying to use it to continously bring older existing deployed infra up to date it can get tricky. I strongly suggest self versioning your TF files and following strict pretested upgrade paths.

nanis · on Sept 19, 2021

> Terraform is great for "fire and forget" deployments.

Indeed, terraform is great for exactly the opposite: For making sure both the initial infrastructure and subsequent modifications to it are first proposed, discussed, reviewed in code and applied afterwards instead of anyone with privileges tinkering with settings on an "as needed" basis in whatever console thereby ending up with an infrastructure where you have no idea who turned on the frobnicator, why it was set to 11, and what might be the consequences of changing the setting.

d0gsg0w00f · on Sept 19, 2021

We experienced a lot of problems with when trying to manage standard configurations of hundreds of AWS VPC's ranging in age from 3 days old to 6 years. Accounts built years ago using older terraform would have to be handled completely differently because inevitable TF template drift and TF versions made each upgrade path unique. Not insurmountable by any means but also not trivial. Just sharing my experience.

robertlagrant · on Sept 19, 2021

If you just use a cloud provider's UI you will have no separation between desired and actual states. Then, whatever people change it to is the only truth.

marcinzm · on Sept 19, 2021

We use Terraform for cloud infrastructure and, basically, helm deploys of external apps. In other words things that don't change too often and the update of which has to be managed carefully anyway. The internal apps which get deployed at a much faster cadence use helm directly.

danw1979 · on Sept 19, 2021

This article is not entirely incorrect, but there’s some glaring falsehoods as others have pointed out.

There’s a plug for managing Helm related resources with the author’s own SaaS product at the end, so I’ll file this under “half hearted hit job”.

nojvek · on Sept 19, 2021

Yeah the tfstate was a big gotcha. Pulumi has the same problems.

What we really need is automatic reconciliation. I.e ask the provider what they have and then diff against that.

Or periodically auto-importing.

Are there any good solutions to auto-importing?

arpinum · on Sept 19, 2021

The hard part with auto-importing is the resource id. This is usually generated server-side and is not included in the hcl. Often resources define a user-supplied identifier as well, such as a name property, and this could be used for auto-importing if the property has unique constraints applied against it. However, not all resources have this feature, so its not a universal solution.

kall · on Sept 19, 2021

Hm, maybe what we really need is a new entrant in hyperscale clouds that is built from the ground up for IaC and just does away with the split between state and reality. I would love to see one, anyway.

robertlagrant · on Sept 19, 2021

What's the point of reimporting? Aren't you then no longer separating desired state from actual?

bovermyer · on Sept 19, 2021

From the problems the author is having, it would appear that perhaps Pulumi would be the better choice in this case.

rgoulter · on Sept 19, 2021

In the article, the problems the author discusses are:

1. Inconsistent behaviour between providers. e.g. if a resource has been destroyed since the last `terraform apply`, then some resources/providers would recreate the resource, others wouldn't. (Similarly, there's not a guarantee that the state after running `terraform apply` matches up with what's there, if the provider is happy with its state file).

2. The dependencies of already-applied resources can block `terraform apply` if the upstream API for these resources suffers an outage.

3. If a `terraform apply` applies some resources before failing, this can result in an inconsistent state. Either the resources need to be deleted, or imported.

I'm not familiar with Pulumi; what aspects of these would Pulumi help with?

kall · on Sept 19, 2021

3 happens regularly and I don‘t see how the other two would really be different, since some pulumi providers are using terraform providers under the hood.

htrp · on Sept 20, 2021

It's not the golden hammer, but it is a hammer in a world full of nails.

jounker · on Sept 19, 2021

What countries have fewer deaths per 100K than Sweden? All of Sweden’s neighbors.

duskwuff · on Sept 19, 2021

Wrong thread?

p2t2p · on Sept 19, 2021

My experience is that terraform sucks. A lot. Yet everything else seem to suck even more

NKosmatos · on Sept 19, 2021

Came here to read about planet terraforming but instead I learned about (yet another) cloud deployment tool :-)

scrollaway · on Sept 19, 2021

Ah, congrats on being part of today's ten thousand.

I very highly recommend investigating it more and trying it a bit. Terraform isn't mere cloud deployment.

As a small project you can start by deploying an EC2, RDS and some cloudflare records to go with them, all linked together with terraform. This will give you an initial idea of its capabilities.

rad_gruchalski · on Sept 19, 2021

> learned about (yet another) cloud deployment tool

Please don't read it as an attack, not the intent. Amazing that a HN regular can "learn about Terraform" only in the latter part of 2021!

jhgb · on Sept 19, 2021

I made the same mistake. First hearing about this. It turns out that different people may have different backgrounds! ;)

johannes1234321 · on Sept 19, 2021

https://xkcd.com/1053/

I assume there is some selective reading and most articles referring to Terraform have a cloud or hashicorp reference in the title. If you don't care about either, you don't read the Terraform things on HN.

picardo · on Sept 19, 2021

Terraform can be frustratingly slow at times. You have to realize that at the end of the day, it's an abstraction layer on top of the public APIs of a cloud service. If all your services are hosted on a single cloud, you don't need Terraform.

We saw a huge improvement in our build times after we started using AWS CDK directly.

kenerwin88 · on Sept 19, 2021

Hmm, as a former AWS employee who has used both heavily, my experience has been the opposite.

Terraform’s AWS provider calls the APIs directly, whereas CDK generates Cloudformation, an abstraction on top of the AWS APIs. For me, using Terraform was significantly faster than applying the same stack via CDK.

Or do you mean you’re able to iterate faster writing CDK vs TF?

picardo · on Sept 19, 2021

Thinking back on it, we always used Terraform with Pulumi, which creates its own abstraction layer for a CF stack. It's hard to pinpoint where the root cause of the slowness was.. but in principle having fewer abstractions allowed us to iterate faster, and fix the bugs more quickly.

snom380 · on Sept 19, 2021

From my experience, terraform is almost always slow because it's making API calls out to the cloud providers, and a lot of that in turn is slow because many providers offer "eventually consistent" which terraform needs to compensate for by doing roundtrips to validate that changes are have become visible (applys failing because of that was a common problem in the early days of terraform).

ulzeraj · on Sept 19, 2021

I'd rather devote my time learning an agnostic tool like terraform. I can be part of your team right now working on AWS but tomorrow I might be working for an Azure shop.

qaq · on Sept 19, 2021

CDK synth is reasonably fast but CF is really slow

amarshall · on Sept 19, 2021

What does the AWA CDK do differently than use public APIs?

scrollaway · on Sept 19, 2021

CDK generates cloudformation stacks. Those stacks are deployed as units, within AWS itself. AWS treats all the resources as part of that stack etc; it's a concept entirely proper to AWS.

Terraform can create cloudformation stacks as well, you just have to write the resources for it. It doesn't really make sense to do that. I also don't know that it's … "faster" in any way; cfn is really slow.

orf · on Sept 19, 2021

It’s just tags on the resources and a managed statefile. There isn’t anything different between a bucket created via CDK and a bucket created via terraform, the resources are the same and the API calls to create them are also the same.