It's very difficult to communicate what Terraform's strengths and weaknesses are to someone who's never used it or IAC in general.
Spend enough time playing with it and understanding it, you'll end up like me thinking about all the shit you configure left and right such as hooking up Stripe's secret keys, Google Analytics and the webmaster console, and just about everything else we configure via web interfaces, and you'll think:
Why can't we use Terraform for this as well? Manage these SaaS products the same way we manage the rest of our cloud, test and audit changes, automatically roll secrets and update anything that needs updating the moment you change a setting.
Ah well. Not enough APIs out there. And it's difficult to write and maintain terraform plugins for these throwaway cases especially if they are going to use private APIs. Anyone know if Pulumi plugins are easier to write?
I wish I could use it for everything as well. Every other tool the business depends on just scares me to shit that it's config isn't in code and we can't have a proper backup of it. At least Datadog supports terraform for most things and not only we manage all our infra through terraform we manage all our monitoring with it too. I doubt very much I'll ever go back to non-monitoring-as-code if that's even a term.
Infrastructure, all the monitoring as well as all the on-call rotation configurations (and anything else that is in that loop) should all be in code, and all changes should be reviewed the same way as application code does. If it doesn't, you can't really trust you're gonna be alerted properly when things start breaking.
I wish I could use it for personal things too, I'd rather have my bank account settings, my government tax information, yada yada in a personal terraform repository for example. Change of address? Commit a a change, check if the plan is good and apply to change it everywhere. Though having lots of experience with Terraform I can only imagine what the equivalent of trying to delete an S3 bucket that still has data in it is for a bank account.
I so agree; I tried/am trying to write an Android provider - currently just have app (un)installation working, and not very well, I expected settings management to be the hard part, but egh.
Why can't everything have nice public APIs! And, while I'm at it, some sort of all-encompassing ticketing system, hell even if it were Jira. 'Pothole', assign local council. Blocked on '2021 roadworks funding increase', backlogged. Assigned to councillor. Won't fix. Ok - maybe I'm not making it sound great, but at least you could see some reasoning, and what the blockers are. Follow the chain to work out that 'communal lobby needs repainting' hasn't been acted kn by building management company because, ultimately, of global supply chain disruption and the contractor's supplier's supplier can't get any paint ingredients.
I've used https://github.com/Mastercard/terraform-provider-restapi successfully with a cloud provider which provides a suitable HTTP API. There was a bit of fiddling with JSON formatting and their API docs, but it wasn't too hard all in all.
But like you say - now I've done that, I want to do it for every UI that I'm forced to log in to!
That's right, which is why they usually recommend not committing tfstate but using some remote backend.
We've gotten away without that because the APIs we use don't contain any secrets. (The auth token for making the requests is just an environment variable.)
And this setup is also encouraged by HashiCorp (at least I saw a talk by them).
Use ansible for your “smart” sequential executions and Terraform as a sane wrapper for state.
I’ve seen (and fixed) so many ugly messes at this point made as a result of mixing and wrapping tools with different purposes together like Ansible + Terraform that it’s something I strongly discourage. I recommend to keep the boundaries and responsibilities of the tools clear. In this case, Terraform for the creation of resources and Ansible for the configuration of those resources. In my opinion, this gives as a result a much simpler and maintainable ecosystem in the long run.
So the other way around.
Do you employ a GitOps approach this way?
I find it hard to figure out how to use GitOps with Ansible? How do you make a PullRequest which indicates that something should get deleted? You still would have to keep around an ansible playbook for the stuff you want to delete.
After a little grokking it’s a surprisingly easy way to manage arbitrary API resources. They are just functions in your language of choice and can accept parameters from any dependency. It’s also possible to roll your own provider, but dynamic providers cover all kinds of use cases.
Your intuition is right on pulumi. Creating these kinds of extensions is minimal effort. You can start out by adding a "dynamic resource" class to your infra codebase and extract it into a plugin later, or not.
These are not the same as "real" pulumi providers that
run across all supported programming languages, but I think they are a good enough fit for the cases you mention.
> On terraform it's different, because of the tfstate. All the deployed elements are stored in the tfstate, re-running terraform won't update resources that are supposed to be in a specific state but are not.
This is incorrect, and makes me wonder how the author has used terraform. Terraform will certainly detect differences between managed and current state for the resources it manages whenever you do a plan/apply.
The major challenge is that terraform _can only reconcile resources or configuration values it knows about_, and that depends very much on how a particular cloud vendor or terraform provider has modelled resources. I believe the Helm provider is one example where it (at least in the past) haven't had a good way to reconcile state.
This stuck out to me too. Terraform absolutely does check the current reality of the state and applies changes to do what the HCL tells is to. They are either using terraform in a really weird way, or this article was written by someone that doesn't actually run terraform themselves.
Phew! I thought I was misunderstanding terraform when I saw that.
Perhaps they are working with certain poorly implemented or buggy providers, and not realizing those providers were doing something different than terraform's properly working behavior that most providers implemented.
Bugs happen, but the first step is agreeing on intended behavior so we know what's a bug!
It would be even better if you could somehow tell it it should have control over the entire account, so that anything (entire resources I mean, not just changed properties) created outside terraform would be destroyed.
In terms of API use though I suppose that'd be quite expensive to plan - listing every possible AWS resource (in every region!) for example.
You'd have to go per-region first to avoid a major redesign on the provider (the API is also per-region). I can see something like a `controlled_resource_types` list attribute on the provider that you could set to e.g. `[aws_instance]` to inform the provider it needs to compare the list of resources of the specified type to the state.
True. That'd be fine though, if you wanted to be really thorough you could just specify a provider for every region and only actually 'use' the ones you wanted, the others just serving as a dummy to enforce no resources.
> For some resources like RDS or EKS, it won't check if the resource already exists or not. So if it's missing, nothing is going to happen as it's marked are deployed in the tfstate file
There's also brokenness around terraform for_each and providers. If you have a module that creates a Kubernetes cluster and then applies a helm chart to it, you can't convert it to one that takes a bunch of cluster definitions that it iterates over using for_each. Basically, there is no way to do this in a date driven way. Sucks.
Surprising that the author is writing a tool for managing your servers, so writes a post about how Terraform isn't great at managing your servers...
It seems like the root of the problem is that the author wants to use Terraform to manage their AWS state, but also wants to use the web console to directly change things, so Terraform gets out of sync. Terraform has a command to handle this - https://www.terraform.io/docs/cli/commands/refresh.html
As the sibling commenter notes, Terraform has a refresh flag, but I wonder if Kubernetes’ model is better here. Rather than a one-off process that tries to update everything, Kubernetes has many small controllers which are essentially processes on the cluster that just run a control loop. Each controller corresponds to one resource type, so it will just loop over incoming events that pertain to the resource type in question and attempt to reconcile the state of the target resource instance with the desired state. If it fails initially, it will retry with back off. If something doesn’t stabilize after several minutes, an alert can notify a human.
The key differences between the controller approach and the IaC approach are, I think, lots of little processes continuously reconciling state for all resources of a given type (many small loops that touch all resources of a given type on the entire cluster) versus a one-off process that tries to touch just the resources it cares about and if it fails it just gives up.
One thing Kubernetes definitely improves upon Terraform is that Kubernetes uses a YAML “assembly language” for its infra as code, but that YAML could be generated by a real programming language. Terraform expects you to write HCL, which is an accidentally re-invented programming language (every IaC tool provider thought static configs like YAML would suffice as a human interface, but as they gradually realized the need for more dynamism, Terraform and others would bolt on one dynamic feature after another until they had a slow, unfamiliar, and counterintuitive programming language). Terraform has a CDK that allows writing in other languages, but I’m skeptical that it liberated you from Terraform’s model of the world (e.g., if I rename a variable in CDK, does it try to destroy and recreate the underlying resource as with Terraform?). I’m also concerned that rather than allowing us to generate YAML in the obvious way, it will require bizarre inheritance patterns like the AWS CDK. I would be curious to hear from folks who have used the CDK.
"every IaC tool provider thought static configs like YAML would suffice as a human interface, but as they gradually realized the need for more dynamism, Terraform and others would bolt on one dynamic feature after another until they had a slow, unfamiliar, and counterintuitive programming language"
I constantly see soo many people step on the same rake, it's incredible. Tools like Tilt let you use python, it's a much more sensible approach.
The reasoning you put in also means kubernetes is unsuited to be controlled by terraform. Too many lifecycles (resources) to centrally control. kubernetes custom resources can have dependencies on others which terraform either needs to support as well. Which is not doable to maintain.. keep your kubernetes manifests outside or your terraform state.
My company manages a lot of Kubernetes manifests with Terraform without issue. Terraform is just generating the manifests in this case; Kubernetes is doing the reconciliation work. More complex than is ideal (i.e., if we were starting out with Kubernetes we probably wouldn’t use Terraform) but it works reasonably well.
The Hashicorp co-founders considered alternative approaches when originally designing terraform. The actor model was considered but dismissed. That's not a million miles away from the kubernetes reconcile loop model:
> Then we transitioned to an actor-based model where each resource was almost an actor, and there was a message-passing interface between them.
> This allowed the system to be highly concurrent the way Terraform is today, but also confusing for users to deal with and very difficult to build a programming model around, because the ordering of execution was so random and everything was happening concurrently.
There are a lot of developments around using Kubernetes as an IaC platform for the reasons in your comment. The combination of a standard API model in CRDs + the controller model maps nicely to managing infrastructure and exposing resources to developers.
<https://crossplane.io> just graduated to CNCF Incubation and each of the cloud providers are working on K8s controllers and code generators (like Amazon Controllers for Kubernetes, Google Config Connector, and the Azure service operator).
To be clear, the issue isn’t the HCL syntax. You could similarly use Cue to generate HCL. The problem is using Terraform’s dynamic features which were poorly designed.
My biggest issue with Terraform is the impedance mismatch between HCL and the rest of the known universe.
When you write a provider, you spend half your time converting data structures from the HCL submitted to your provider into the JSON your target service inevitably expects, then you spend half your time converting the JSON your target service inevitably returns into HCL for Terraform to consume, and then you spend another half of your time fixing bugs and polishing.
It's okay when you're building simple providers, but anything reasonably complex becomes unwieldy. I had a go at building some providers for AWS services that were not supported by Terraform or CloudFormation... and I just retreated to cheesy Lambda custom resources for CloudFormation.
I read you. The half of the Terraform's value today is "let's have common aesthetics - especially the examples and snake_case_naming_convention - and convert any REST API in the world to that".
If you review it carefully, it is apparent how much coding effort and many moving parts were used to perform a transformation which seems disproportionately primitive.
Unfortunately HCL is also an unnecessary learning curve for users. I still don’t have my head around a list versus a bunch of blocks with the same name, for example. I originally thought it was syntax sugar for a single mechanism, but I’ve had errors for trying to use one instead of the other before.
But I wanted to be a Terraform user. So an alternative interpretation is that, as designed, Terraform is slowing down adaptation of the known universe for Terraform users.
I’m partial to Pulumi for this reason. It allows devs to use familiar languages to define infrastructure with their familiar tools, write tests, and even interop with existing terraform.
I wish Terraform were less opinionated. It has a very clear set of rules you have to adhere to, and if you try to do anything remotely complex you will encounter barriers left and right.
An example is the fact that `for_each` is not supported on providers [1], an issue with 230 likes which has not been solved since January 2019. This had me resort to a Python script which generates a `.tf.json` file, definitely not ideal. Infrastructure as code sounds great, but in practice it's closer to "infrastructure as a non-standard markup language".
You have to understand that when IaC was new, the marketing was “it’s so simple you can just write YAML/JSON/etc” because frankly the industry was too dumb to understand that “using a real programming language to generate a description of the desired resource state” and “using a real programming language to imperatively reconcile the current and desired states oneself” are different things. So Terraform began with something that resembled YAML in its static-ness, and over time, more power was required so they would bolt on a dynamic feature but were reluctant to give the impression that they were building a programming language so the feature would be as obscure as possible. But that wouldn’t be enough either so they would add still more dynamic features, each comparably obscure until in time they’d built a complete, obscure programming language.
But this wasn’t just Terraform! The entire industry did this too. CloudFormation began as simple JSON, but over time they allowed you to encode the abstract syntax tree of a shitty programming language in your YAML, and CloudFormation would interpret it. However stupid that may sound, in the Kubernetes world, we have Helm which lets you generate YAML with text templates which is honestly the dumbest idea in the world (imagine a compiler that generates syntactically invalid machine code if the input program has an extra white space character).
Of course in all of these cases the answer is staring us in the face: use a static language (YAML, JSON, etc) to describe the desired state, and use a higher level language (like Python or Starlark or Dhall or etc) to generate that static desired state description. The only thing Terraform (or any IaC tool) should care about is the YAML description. That it is generated from Starlark or TypeScript is just an implementation detail.
Instead of that, though, we get CDKs which are so close, but admittedly I haven’t used them in anger yet.
One of the best parts of CloudFormation was their introduction of Macros. You can take either your whole template or just a snippet, and perform dynamic transformations by calling a lambda. I'll admit I've gone so far as being able to embed ERB (Ruby) into my templates in order to more dynamically define some resources based on stack parameters. I can also create N resources with common configuration based on the values of a CommaDelimitedList.
I think the idea here is that macros are neat in any language, but in CloudFormation they can help automate stuff that is only difficult because of CloudFormation, and the macros themselves are harder to use than those in a normal programming language. In all cases, I think it’s strictly less nice than generating your CloudFormation YAML with Python or similar.
The problem of generating your own YAML is you end up having to maintain multiple copies of nearly identical templates and keeping them in S3. I have done a bit of that and maintaining those as build-time assets rather than run-time assets is less appealing. Granted, that's required whenever you get into the realm of dynamically-determined sets of parameters.
I think this is less that it is opinionated but more that the HCL evaluation feels like a pile of hacks. There are unclear rules on what can be evaluated when and what dependencies are possible. Part of this is so that `plan` can work as it does, but it seems like there are just major gaps in general. For example providers can't depend on resources. This makes it very difficult to for example set up EKS then use the kubernetes provider to manage the resources in the cluster. The solution is obviously separate stacks but that brings in a whole bunch of other problems.
I think Terraform is quite possibly the best tool available, but there are clear flaws with both the model and the implementation. I think if I were to make a Terraform v2 I would make `plan` completely pure. This would avoid the provider issues, make validation and testing in CI easier and a whole bunch of other benefits. Of course there are downsides. For example EC2 instance IDs are random so you can't just include them in your pure plan. You would need some type of placeholder that is used for evaluation. This does cause some issues as it limits the operations that you can do with that value (so you can't pick the instance size based on the random instance ID) but overall I don't think it would be a major issue if the final substitution was handled well by the framework.
Terraform definitely has it's warts, though, as other commenters wrote, not everything in the article is true (the reconciliation part): dependency resolution blows up in time as your number of resources grows, so you need to split up your statefiles; it can't passively listen for drift happening in a dataflow-like way (that would be awesome); it's not transactional like CloudFormation (which is more of a tradeoff, than a cons), and more.
It is however a great improvement over the previous ways of doing things, and probably the best out of the current similar alternatives out there (you might mention Pulumi as a strong contender, especially for AWS glue writing).
And - though as per the disclaimer, I may be biased - until a better tool comes up, I'd advise looking for specialized IaC CI/CD tools to ease your path with Terraform, like Spacelift[0].
It can help you with orchestrating dependencies among multiple state files; take care of scheduling regular drift detection/reconciliation without going into your way and locking your state; gives you a policy system for making sure preventable mistakes don't happen (i.e. recreating a resource you definitely never want to recreate); manages your credentials depending on whether you just want to run a plan, or apply your changes, and much more.
I can't imagine doing Terraform again without a tool like it.
Disclaimer: Software Engineer at Spacelift. If you want me to expand on the "and much more" part, you can find a demo-scheduling link in my bio!
The author seems to have some misunderstandings on how Terraform is supposed to work - you should get the "automatic reconciliation" they're saying is missing. Also,
> I run once again the “terraform apply” command. But for some reason, Cloudflare API doesn't answer and I got completely stuck there without the possibility to update with Terraform this field because of linked dependencies.
You should be able to circumvent this with a `-target`.
That being said, I know exactly what they're talking about with helm. IME the helm provider was/is a complete mess and gets inconsistent state a lot. Helm specifically I would also keep out of TF until that is fixed, if ever. I haven't had that happen with other providers, though. Perhaps OP was just really unlucky ending up with the odd half-broken AWS module.
An alternative opinion: When I worked at a large tech company, my team made a conscious decision to not use terraform. This gave us some key advantages- we are able to adopt new cloud features immediately, months before they were available in tf, and our direct cloud access let us build features that would surprise the teams using tf within the company.
If your core competency isn't dependent on your cloud platform tf is a great tool. But using cloud APIs directly was great for us.
This is fine, I've done it extensively myself for some of the bleeding-edge cloud stuff, but the importance of things like tracking state, managing hierarchical resource dependencies, or retry/back-off logic shouldn't be tossed aside simply because there are gaps in what's available in the Terraform providers. Especially where change management is important (basically any enterprise company).
I'd caution others reading this against abandoning something altogether and writing bespoke IaC tooling simply because the stable approach doesn't cover every (bleeding) edge case.
You'll spend a lot of time reinventing the wheel, and while it's fine for certain situations (like when you only care about desired state, not known state, for instance), you'll move faster (and likely safer) by sticking with tools like Terraform for the bulk of your infra, and augmenting here there with cloud APIs/SDKs when needed.
Yes, we did have to implement our own state tracking, retries/recovery, etc- but since we were focused on a limited subset of the cloud API, this was pretty easy.
So you re-implemented terraform but worse most likely.
Also you could have added those missing features and re-use the TF engine, it's very simple to include new API of an existing provider.
No, we used an entirely different architecture/paradigm not possible with tf, and had capabilities tf doesn't attempt to provide (such as coordinating migrations with both cloud and application APIs, or managing capacity while upgrading 10000s of CPU cores worth of compute).
Would the time spent re-implementing a specialised Terraform subset be better spent simply maintaining a private branch of the AWS Provider? You can add your secret/special API without having to do all the other heavy lifting as well.
This makes your own effort for customisation minimal, keeps your knowledge portable and because your added features can be separated in to different files and the provider API is stable you can also easily backport/fast-forward new changes.
Our custom code was fewer LOC, more robust and faster than tf. Honestly, it's not hard to beat OSS if you have a team of great engineers. This was far from the only time our internal projects were better than OSS- it gave us advantages for removing bottlenecks in the critical paths.
So when engineers left the company, was their knowledge portable? And attracting new talent, did they come with X years knowledge for the custom code?
With fewer LOC I'm not sure what you mean, the provider code is pretty small, smaller than custom ansible, salt or puppet modules. Smaller than CDK and Pulumni as well. Sure, you'll have to write Go, but that's about the only hurdle.
Everyone doing a round of NIH for internal tooling is ultimately not making the tide rise.
Edit: don't get me wrong here, writing internal tools to do a job the right way for the right needs isn't "invalid" or something like that, but people often dismiss the rest of the lifecycle of knowledge and maintenance when making something completely custom.
Same here. It's incredible how much efforts developers are willing to put on popular "devops" tools when the job could be done faster with 200 lines of Python.
eg. AWS ChatBot not available in TF yet.
TBH AWS haven't even added it to their Go SDK. So, I cannot blame TF. But anyway that's one of the inheret problems of TF plugin system.
Compare to kubectl. Where you can write plugins in bash/shell and mark with execute bit, put it in somewhere in your $PATH as kubectl-blabla and use it as "kubectl blabla".
It's not fair to compare imperative simple shell scripts with the things Terraform does. It has schema validation, state comparison, retries, failure handlers etc.
NAT Gateways is another notable feature that took CloudFormation months yet Terraform had on day 1.
If you can configure something via CloudFormation you can integrate it via Terraform et al also, since they have resources representing CloudFormation stacks.
This! Is not like you can't go beyond what Terraform offers by default. Running CloudFormation stacks from Terraform is a neat way of solving missing apis/integration. And that's exactly what my team did when Terraform was missing a lot of lambdas functionalities. We just declared the CloudFormation Stack for lambdas and then call it from Terraform.
There’s no reason you can’t do something similar with Terraform either - plugins speak GRPC and thus could be implemented in Python, with Node.js or with Rust.
However, if AWS have not published metadata for a given service to be used across their various SDKs, it’s hard to take that service particularly seriously, so I’m not sure I’d bother with this.
One of the biggest reasons that have kept me away from terraform apart from the esoteric language is that terraform modules are always a few steps being from the upstream public cloud offerings.
In the sense that whenever there’s a new API or service available in any of public clouds and their official SDKs there will always be a delay before this new service/feature/API will become available in terraform.
First time I encountered it with GKE private clusters 3 or 4 years ago. Now it is AWS Keyspaces.
The second biggest reason is whenever you have a requirement for a hybrid or multicloud then well you are left with rigidity if HCL. It is probably doable but for what sake?
Solution: get a real language, write a STATELESS configuration management(IaC) system for your own needs and maintain it. The majority of public and private cloud providers ship SDKs in most popular languages that will help you build your own software solution and reduce your dependence on a third party which I would put under progressing operational risk category. Yaml/json/cue/toml for end user configs would suffice.
Example: for one of my previous projects were built a tool for a hybrid AWS-openstack setup, and were managing a dozen of busy environments.
This is my preference as well. I've done everything from makefiles and bash scripts to a monolith Go program that statelessly provisions/tears down resources.
Even makefiles are pretty straightforward, though you really want operations to only trigger when checksums differ -- timestamps result in a lot of redundant operations. As long as everything is idempotent, it's pretty straightforward.
> Terraform modules are always a few steps being from the upstream public cloud offerings.
My experience has been the exact opposite. Usually Terraform offers support for cloud services long before the vendor provides an SDK or supports it with their own offering (e.g. Cloudformation). There are still dozens of AWS services, for example that have no CF support offered by AWS.
The emphasis on stateless here is that your desired state us described in code that resides in your repository. Actual state is what you have in your cloud. No need to spend time on state format, storage and related logic and complexity
I mostly agree with this, but you're not considering the time it takes to make and maintain it over time. It is non-negligible and has to be in the balance as well. But from a pure technical POV, I agree.
Terraform basically gives you what cloud providers should have. AWS/Azure are these overcomplicated web interfaces, or undocumented REST APIs, and Terraform gives you a simpler way to configure stuff.
For the cases where the native integration with the cloud provider does not provide the exact parameters you look for, it provides the alternative to integrate with terraform in the background, while making it transparrent to you.
Terraform is actually kind of a nightmare. It's deceptively simple yet requires a massive amount of real world expertise to use it properly. It's a configuration management tools, but more difficult to use and extend.
I'm thinking of designing a series of tools to replace Terraform. The idea would be to break down how modern cloud environments are managed into a couple concepts, and then make a variety of tools that work within those concepts together, so that it's easy to expand and modify the way you use them for your use case. This would enable things like tailoring the use of the tools to a particular deployment strategy, or adding custom business logic, or replacing individual functionality, without being tied to one tool, language, etc.
> On terraform it's different, because of the tfstate. All the deployed elements are stored in the tfstate, re-running terraform won't update resources that are supposed to be in a specific state but are not.
Huh, is that true?
I'm just getting started with terraform but I assumed that was the idea of terraform (where it didn't happen would be a bug), and I think I had seen it happening for the few basic resources I have started out with (S3, cloudfront).
If the state doesn't match the actual configuration of S3, terraform notices, and the plan is to make it so. No? Am I confused and it hasn't been doing this?
Or is this is inconsistent, true of some resources and not of others? That seems surprising. What's the idea?
If Terraform was used for the deployment of the infrastructure, then state IS the actual configuration of the system.
All that a plan does is evaluate what is going to change in the current Terraform state by performing a dry-run of the Terraform code that you have supplied.
If you would actually like to make changes to the Terraform state based on what the Terraform code evaluated then you run a Terraform apply - which will, for the resources deployed via Terraform, update the configurations themselves and update the Terraform state by using the Terraform code as the instruction set.
You can actually see this in action with plan and apply as the output will show you +,-, and ~ where ~ is settings that are going to change but are not new configurations or configuration to be removed.
Edit: Learned from some other comments that Terraform has a 'refresh' command that will take deploy+n time configurations done outside of Terraform and sync those configurations with the state. This might be what you ideally are looking for after deployments?
Right. I guess I'm asking about what happens if state changed outside of terraform.
I thought I had seen terraform correcting it (to match what terraform thinks it should be) in some cases.
OP seems to suggest that in some cases it does and in other cases it doens't. I am surprised if that is inconsistent and unpredictable, and would have expected terraform to (modulo bugs) either always or never do that. And am wondering what terraform's intent is with that.
Made an edit to my original comment but it may help to be here also.
'terraform refresh' may be what you are looking for. This will update the state to match current configurations that may have been done outside of Terraform.
> You shouldn't typically need to use this command, because Terraform automatically performs the same refreshing actions as a part of creating a plan in both the terraform plan and terraform apply commands.
I'm talking about fixing the external actual configuration that has diverged, to match what terraform config wants it to be.
However, snom380 says this is what terraform is intended to do, and does with properly implemented providers, which makes sense to me.
I'm not sure if you are talking about the same things. We -- me, snom380, and the original quote I made from OP -- are talking about what happens when external actually existing live resources have diverged from terraform's state. I understand what you said that under a "perfect" situation, this would not happen. But it does sometimes for various reasons, what the original quote from OP is talking about and what I'm talking about is what happens when it does. I think maybe you are talking about something else.
I can only assume that the author has used some provider that hasn't implemented this properly (helm, I believe, is one example), or that they've run into one of the cases where terraform treats configuration/attachments to a resource as a separate resource (e.g. IAM role vs attached/inline IAM policies).
Bingo. While I see some valid complaints about Terraform in the comments, most of them seem to come from a place of not fully understanding how it works or the features it offers.
It's not the easiest tool to grok, writing configuration files can be very verbose, and its opinions about a number of things can be very off-putting. I tried to "get into" TF many times and never actually fell in love. I like the idea, I like the cross-cloud approach, I just don't particularly like what the experience ends up being. This shit should be easier.
I had mostly similar feelings four years ago. As I understood the reasons behind them, I started respecting more the people at HashiCorp. They are really smart.
We use this exact stack, but we generally would not rely on tfstate. We will remove everything and regenerate it. It’s not an operation done too frequently that we have a big problem. Also, helm we use as a separate layer that is applied after terraform and can be repeated many times. This changes often.
The new use case for Terraform is IT departments using it as an IaC policy control system for self-service. Rather than push teams through a web interface, you just expose Terraform via Terragrunt to the dev teams and run their files through a policy-driven linter before executing. And make it so that Terraform is the only way you can push to prod.
I think people get into trouble with Terraform when they try to use it to do more than provision infrastructure. Things that should probably be part of build scripts, CI/CD pipelines or config management. Terraform isn’t good at those things; but it is very good at provisioning cloud infra in a cloud-agnostic fashion.
I agree most of your comment but the cloud-agnostic part.
Heck, I never get how terraform can be cloud-agnostic... If everybody thinks having a same language (HCL) is equalivent to cloud agnostic, YAML exists...
It is literrally impossible to create a simple VM in 2 different cloud providers without defining them twice with their own specific parameters.
If you use AWS provider, resources start with aws_, if it's GCP it starts with gcp_ and so on. It is not possible to have a "resource vm { name = ... provider = aws }"
Terraform provides the same _workflow_ across clouds, not the same resource model (which would be dumb, since it would necessarily provide only a lowest common denominator representation).
> It is not possible to have a "resource vm { name = ... provider = aws }"
It actually is via modules. It’s a lot of work with basically no benefit though, so in practice people don’t do this.
Pulumi is better at specifically this kind of thing however, since you can implement a common interface which can be specialised for each available cloud.
IME, Terraform is great for "fire and forget" deployments. However, if you're trying to use it to continously bring older existing deployed infra up to date it can get tricky. I strongly suggest self versioning your TF files and following strict pretested upgrade paths.
> Terraform is great for "fire and forget" deployments.
Indeed, terraform is great for exactly the opposite: For making sure both the initial infrastructure and subsequent modifications to it are first proposed, discussed, reviewed in code and applied afterwards instead of anyone with privileges tinkering with settings on an "as needed" basis in whatever console thereby ending up with an infrastructure where you have no idea who turned on the frobnicator, why it was set to 11, and what might be the consequences of changing the setting.
We experienced a lot of problems with when trying to manage standard configurations of hundreds of AWS VPC's ranging in age from 3 days old to 6 years. Accounts built years ago using older terraform would have to be handled completely differently because inevitable TF template drift and TF versions made each upgrade path unique. Not insurmountable by any means but also not trivial. Just sharing my experience.
If you just use a cloud provider's UI you will have no separation between desired and actual states. Then, whatever people change it to is the only truth.
We use Terraform for cloud infrastructure and, basically, helm deploys of external apps. In other words things that don't change too often and the update of which has to be managed carefully anyway. The internal apps which get deployed at a much faster cadence use helm directly.
The hard part with auto-importing is the resource id. This is usually generated server-side and is not included in the hcl. Often resources define a user-supplied identifier as well, such as a name property, and this could be used for auto-importing if the property has unique constraints applied against it. However, not all resources have this feature, so its not a universal solution.
Hm, maybe what we really need is a new entrant in hyperscale clouds that is built from the ground up for IaC and just does away with the split between state and reality. I would love to see one, anyway.
In the article, the problems the author discusses are:
1. Inconsistent behaviour between providers. e.g. if a resource has been destroyed since the last `terraform apply`, then some resources/providers would recreate the resource, others wouldn't. (Similarly, there's not a guarantee that the state after running `terraform apply` matches up with what's there, if the provider is happy with its state file).
2. The dependencies of already-applied resources can block `terraform apply` if the upstream API for these resources suffers an outage.
3. If a `terraform apply` applies some resources before failing, this can result in an inconsistent state. Either the resources need to be deleted, or imported.
I'm not familiar with Pulumi; what aspects of these would Pulumi help with?
3 happens regularly and I don‘t see how the other two would really be different, since some
pulumi providers are using terraform providers under the hood.
Ah, congrats on being part of today's ten thousand.
I very highly recommend investigating it more and trying it a bit. Terraform isn't mere cloud deployment.
As a small project you can start by deploying an EC2, RDS and some cloudflare records to go with them, all linked together with terraform. This will give you an initial idea of its capabilities.
I assume there is some selective reading and most articles referring to Terraform have a cloud or hashicorp reference in the title. If you don't care about either, you don't read the Terraform things on HN.
Terraform can be frustratingly slow at times. You have to realize that at the end of the day, it's an abstraction layer on top of the public APIs of a cloud service. If all your services are hosted on a single cloud, you don't need Terraform.
We saw a huge improvement in our build times after we started using AWS CDK directly.
Hmm, as a former AWS employee who has used both heavily, my experience has been the opposite.
Terraform’s AWS provider calls the APIs directly, whereas CDK generates Cloudformation, an abstraction on top of the AWS APIs. For me, using Terraform was significantly faster than applying the same stack via CDK.
Or do you mean you’re able to iterate faster writing CDK vs TF?
Thinking back on it, we always used Terraform with Pulumi, which creates its own abstraction layer for a CF stack. It's hard to pinpoint where the root cause of the slowness was.. but in principle having fewer abstractions allowed us to iterate faster, and fix the bugs more quickly.
From my experience, terraform is almost always slow because it's making API calls out to the cloud providers, and a lot of that in turn is slow because many providers offer "eventually consistent" which terraform needs to compensate for by doing roundtrips to validate that changes are have become visible (applys failing because of that was a common problem in the early days of terraform).
I'd rather devote my time learning an agnostic tool like terraform. I can be part of your team right now working on AWS but tomorrow I might be working for an Azure shop.
CDK generates cloudformation stacks. Those stacks are deployed as units, within AWS itself. AWS treats all the resources as part of that stack etc; it's a concept entirely proper to AWS.
Terraform can create cloudformation stacks as well, you just have to write the resources for it. It doesn't really make sense to do that. I also don't know that it's … "faster" in any way; cfn is really slow.
It’s just tags on the resources and a managed statefile. There isn’t anything different between a bucket created via CDK and a bucket created via terraform, the resources are the same and the API calls to create them are also the same.
Spend enough time playing with it and understanding it, you'll end up like me thinking about all the shit you configure left and right such as hooking up Stripe's secret keys, Google Analytics and the webmaster console, and just about everything else we configure via web interfaces, and you'll think:
Why can't we use Terraform for this as well? Manage these SaaS products the same way we manage the rest of our cloud, test and audit changes, automatically roll secrets and update anything that needs updating the moment you change a setting.
Ah well. Not enough APIs out there. And it's difficult to write and maintain terraform plugins for these throwaway cases especially if they are going to use private APIs. Anyone know if Pulumi plugins are easier to write?