Update: The Terraform provider is now available on the Terraform Registry.
We are making sweeping and backwards-incompatible changes to the oVirt Terraform provider. We want your feedback before we make these changes.
Here’s the short list what we would like to change, please read the details below.
- The current
master
branch will be renamed tolegacy
v0
. The usage of this provider will be phased out within Red Hat around the end / beginning of next year. If you want to create a fork, we are happy to add a link to your fork to the readme. - A new
main
branch will be created and a new Terraform provider written from scratch on the basis of go-ovirt-client. (Preview here) This provider will only have limited functionality in its first release. - This new provider will be released to the Terraform registry, and will have full test coverage and documentation. This provider will be released as version v2.0.0 when ready to signal that it is built on the Terraform SDK v2.
- A copy of this new Terraform provider will be kept in the
v1
branch and backported to the Terraform SDK v1 for the benefit of the OpenShift Installer. We will not tag any releases, and we will not release this backported version in binary form. - We are hosting a community call on the 14th of October at 13:00 UTC on this link. Please join to provide feedback and suggest changes to this plan.
Why are we doing this?
The original Terraform provider for oVirt was written four years ago by @Maigard at EMSL-MSC. The oVirt fork of this provider is about 2 years old and went through rapid expansion, adding a large number of features.
Unfortunately, this continuous rapid growth came at a price: the original test infrastructure deteriorated and certain resources, especially the virtual machine creation ballooned to a size we feel has become unmaintainable.
If you tried to contribute to the Terraform provider recently, you may have noticed that our review process has become extremely slow. We can no longer run the original tests, and our end to end test suite is not integrated outside of the OpenShift CI system. Every change to the provider requires one of only 3 people to review the code and also run a manual test suite that is currently only runable on one computer.
We also noticed an increasing number of bugs reported on OpenShift on oVirt/RHV related to the Terraform provider.
Our original plan was that we would fix the test infrastructure and then subsequently slowly transition API calls to go-ovirt-client, but that resulted in a PR that is over 5000 lines in code and cannot in good conscience be merged in a single piece. Splitting it up is difficult, and would likely result in broken functionality where test coverage is not present.
What are we changing for you, the users?
First of all, documentation. You can already preview the documentation here. You will notice that the provider currently only supports a small set of features. You can find the full list of features we are planning for the first release on GitHub. However, if you are using resources like cluster creation, etc. these will currently not work and we recommend sticking to the old provider for the time being.
The second big change will be how resources are treated. Instead of creating large resources that need to call several of the oVirt APIs to create, we will create resources that are only calling one API. This will lead to fewer bugs. For example:
ovirt_vm
will create the VM, but not attach any disks or network interfaces to it.ovirt_disk_attachment
orovirt_disk_attachments
will attach a disk to the VM.ovirt_nic
will create a network interface.ovirt_vm_start
will start the virtual machine when provisioned, stop it when deprovisioned.
You can use the depends_on
meta-argument to make sure disks and network interfaces are attached before you start the VM. Alternatively, you can hot-plug network interfaces later. For example:
resource "ovirt_vm" "test" {
cluster_id = "some-cluster-id"
template_id = "some-template-id"
}
resource "ovirt_disk" "test" {
storagedomain_id = "some-storage-domain-id"
format = "cow"
size = 512
alias = "test"
sparse = true
}
resource "ovirt_disk_attachment" "test" {
vm_id = ovirt_vm.test.id
disk_id = ovirt_disk.test.id
disk_interface = "virtio_scsi"
}
resource "ovirt_vm_start" "test" {
vm_id = ovirt_vm.test.id
depends_on = [ovirt_disk_attachment.test]
}
The next change is the availability of the provider on the Terraform Registry. You will no longer have to download the binary. Instead, you will be able to simply pull in the provider like this:
terraform {
required_providers {
ovirt = {
source = "ovirt/ovirt"
version = "..."
}
}
}
provider "ovirt" {
# Configuration options
}
The configuration options for the provider itself have also been greatly expanded, see the preliminary documentation for details.
What’s changing behind the scenes?
The new Terraform provider is a complete rewrite based on the go-ovirt-client library. The single biggest advantage of this library is that it has built-in mocks for all resources it supports. Having mocks allows us to run tests without needing to spin up an oVirt instance. We have already configured GitHub Actions on the new provider and all changes are automatically checked against these mocks.
We may decide to add an end-to-end test later, but for the foreseeable future we will trust the correctness of the mocks to test community contributions. This means that we will be able to merge changes much quicker.
On the OpenShift side we will also switch to using the new provider, since this is the primary motivation for the change. The OpenShift Installer uses the legacy version 1 of the Terraform SDK, so we will maintain a version 1-compatible copy in the v1
branch, which the installer can pull in. It is important to note, however, that the v1
branch will be a pure backport, we will not develop it separately. Development will be focused on the version in main
that is being released to the Terraform Registry.
What does this mean to you, the contributors?
The current Terraform provider has several pull requests open. Unfortunately, we currently do not have the capacity to properly vet and and run our internal test suite against these changes. In contrast to the new Terraform provider, we do not have working tests, linting, and the code structure that make merging changes easier.
We are very sorry to say that these patches are unlikely to be merged. We know that this is a terrible thing, you have put in effort into writing them. Unfortunately, we do not see an alternative as there already numerous bugs on our radar and adding more code would not make the problem go away.
We want to hear your opinion
As the owners of the original Terraform provider we haven’t been keeping up with reviewing your contributions and issues. Some are several months old and haven’t received answers for a long time. We want to change that, we want to hear from you. Please join our community round table around the Terraform provider on the 14th of October at 13:00 UTC on this link.
We want to know: Which resources are the most important to you? How does this change impact you? Can we make the transition smoother for you? Would you do anything differently in the light of the issues described above?
Overall I’m glad of this change, and that it’s moving in the right direction. However, I have a few queries.
Behaviour comparison:
* Compared to the old provider, this will lead to a great increase of resource count (rather than one VM resource, I now have at least 5 resources representing that). Will this have impact on provider speed?
* I like the current behaviour of the old provider, in that when I create a new VM it starts up automatically, but then ignores future status of the VM in future runs. Will vm_start support this, so I can auto-start VMs on creation but not have the provider start shutdown VMs?
Requests for new provider:
* I’m already managing a large production infrastructure using this provider. I would like to migrate to the new provider, but obviously this won’t be trivial/automatic due to many changes. Can you therefore ensure there is testing and documentation for importing existing resources? Previously I’ve had to import TF resources by extracting IDs from the oVirt engine database – whilst I wouldn’t expect this to necessarily be documented/supported, importing resources should be. The previous provider was a bit hit and miss with this (I raised a couple of PRs for it), and sometimes had some no-op changes to state on a freshly imported resource.
* Examples are always overly simplistic. Currently I am getting a list of VMs from an external data source, and then creating them in a for_each loop in TF. By separating out the resources, this all gets a lot more complicated as I have to tie the IDs of resources together. Examples almost always do this with static strings, as yours in this post do, but that is not realistic for large deployments. Can you please ensure you give more complex examples, for example creating VMs with NICs and Disks from a input JSON list.
* Ensure that it’s maintained and PRs are responded to in a timely manner! I’m aware this is one of the drivers of the change, so I’m optimistic this will be the case.
Hey @Jake Reynolds,
Thank you very much for your feedback!
1. The main detractors of speed in any TF provider are the API calls made, by a large margin. Not only is the number of API calls not going to change, but Terraform automatically executes them in parallel, so you may actually see the new provider being faster.
2. That is an interesting use case, you can use the ignore_changes setting in Terraform to do that in the new provider. This will have the added benefit that the VM will still be automatically stopped if there is a depending resource you want to only be manipulated when the VM is off (e.g. disk attachment)
3. All resources in the new Terraform provider have automated tests for imports, so importing resources *should* work. I have already received a request for a migration guide and I have a Red Hat docs person wanting to contribute to the new TF provider, so I’m very hopeful we’ll have a documented migration procedure.
4. For the few resources we have, we have “in context” examples, for example here: https://registry.terraform.io/providers/haveyoudebuggedit/ovirt/latest/docs/resources/ovirt_disk_attachments If you have anything specific you’d like to see in an example please open an issue
on the Terraform repository once we have moved the code over.
5. Yes! Response times to PRs have been less than ideal to say the least. The reason for that is simple: time. Currently, every PR takes a ridiculous amount of time to review because of the arcane test setup. The new provider won’t have that problem, the tests will run against the go-ovirt-client mocks on GitHub actions automatically and use golangci-lint for verifying that the code meets basic standards. From there on it’s just going to be a sanity check by anyone with the right permission (~a dozen people) and the PR is good to go. If any bugs crop up later as we integrate it in OpenShift that we didn’t catch with the mocks, we’ll open issues for them and fix them with separate PRs.
Thank you for taking the time to write this. If you can, drop by the community call tomorrow and let’s talk more!
Janos
I’m happy we are moving forward with this change, was waiting for it for a long time.
Some feedback:
1. If we can have the old provider blocks provider name renamed to “ovirt_v0” or something so that some users who are having dependency on the old provider can include that in the terraform configuration along with the new one. (using two providers in the same config.) I have not seen any data sources docs published in terraform registry, so can keep using data sources from the old provider till the time we have the new one.
2. One problem with the previous provider is, it doesn’t wait for the ovirt_vm resource for getting an IPv4 through DHCP, terraform exits without having the IP in the state. Might want it to get the IP, save it to the state and then finish resource creation. (Similar behaviour what we find on terraform with other cloud providers{azure,aws})
3. Some times we have seen the some resources getting tainted or errored, because of some reason, no vcpu available on hypervisor host, no IP address available … etc.. ,this seems to cause a problem with other terraform commands like output or refresh to fail in middle, and receive an empty output. (mostly something to do with terraform sdk)
Hey rwxgps,
Thank you for your feedback. Let me answer in detail:
1. Unfortunately, keeping the old provider blocks around is not possible because we are moving to the Terraform SDK v2 and the two are not compatible. They can’t even be added to the same codebase so we can’t implement a compatibility layer either. The new provider will be an effort over time, so until all resources and data sources you need are implemented in the new provider, I’d recommend sticking with the old one, which you can build from the v0 branch. (If you are well versed in Go you can try to add the resources you need yourself, with go-ovirt-client it should be much easier.)
2. I agree that this would be useful, but its not possible within the scope of the oVirt Terraform provider. The DHCP request is sent by your VM image, and is answered by your DHCP server after booting. oVirt does not participate in this exchange.
In other words, the Terraform provider has no way to determine what IP address is assigned when to a certain MAC address.
You would need to either a) create a Terraform provider which queries your DHCP server for this information based on the MAC address, or b) SSH into the machine and extract that information, possibly using a Terraform provisioner.
Both of these are outside the scope of the oVirt Terraform provider as neither of these are oVirt features. You can, however, use the external data source provider to run a script which waits for the IP. See this documentation for details: https://registry.terraform.io/providers/hashicorp/external/latest/docs/data-sources/data_source
3. You should no longer see these issues with the new Terraform provider for the most part. The internals of the Terraform provider have been redesigned so that one resource corresponds to one API call. Creating a functional VM will, therefore, require multiple resources to be created. (VM, NIC, Disk attachment, etc.) As long as the oVirt Engine doesn’t permanently refuse our requests the TF provider should be able to provision your resources just fine, and recover gracefully from any failures that happen in one of the resources.
I hope this helps. If you have further questions, please raise them on GitHub, as keeping track of things to develop is easier there than in the comments of a blog post.
Cheers,
Janos
Hey @rwxgps,
I have to issue a little update: I managed to figure out a way to wait for the DHCP to happen. If you have the guest agent installed, the IP will be reported. I will be adding a data source to wait for the IP to be reported. This will enable you to add a provisioner to that effect.
Cheers,
Janos
Hi @Jonas,
Thanks for considering the feedback and implementing this, currently my work around is kind of similar, wait for x amount of time, and query back for ip.
But sometimes it fails seeing the fact that DHCP is not able to assign an IP or there is a huge delay for the ip to get assigned or the agent reporting the ip back.
The new data source for this will help out.
Thanks