---
categories: [terraform]
date: 2026-03-30 00:00:00 +0000 UTC
lastmod: 2026-03-30 00:00:00 +0000 UTC
publishdate: 2026-03-30 00:00:00 +0000 UTC
series: [Terraform]
slug: terraform-cloudflare-provider-v4-to-v5-migration
tags: [terraform cloudflare migration ai cursor devops]
title: How to Migrate the Terraform Cloudflare Provider from v4 to v5 Safely
---

This migration covered 3 environments, more than 50 resource types, and well above 300 Terraform resources.

The team maintaining this environment was basically one contractor. So the operator goal was simple: get access to new Cloudflare provider features and do not break production.

Cloudflare is too close to the edge to migrate casually. DNS, WAF, rulesets, Zero Trust, tunnels, redirects. If the process is messy, the feedback loop gets slow very quickly.

I treated this as a controlled edge migration: phased rollout, no auto-apply, test before prod, and scripted state repair for resources the provider could not upgrade cleanly.

So before touching provider `v5`, I set up the working model first:

1. block dangerous commands
2. start with read-only Cloudflare access
3. work on a local copy of state first
4. dry-run the migration tool
5. only then think about import, state cleanup, and apply

This post is mostly about how to set up the migration so engineers can get through it with less guesswork. Not the full resource-by-resource migration.

{{< notice tip "Long story short" >}}
Do not start this migration with broad Cloudflare permissions and direct access to the normal remote backend.  
Start read-only, block `terraform apply`, pull state locally, and expect manual cleanup.
{{< /notice >}}

## Why This Mattered to the Business

This was not only a Terraform upgrade.

This was a Day 2 infrastructure problem.

At the beginning, most teams are happy just because IaC exists. Fine, we are cool. But then product pressure goes up, urgent changes happen, people do click ops, someone says they will clean it later, and the Terraform coverage starts sliding. First it is 100%, then 90%, then 80%, and after that every change becomes slower and less trustworthy.

That is the real risk.

If the edge configuration is not represented correctly in code, then:

- delivery gets slower
- production changes get harder to review
- new engineers need more tribal knowledge
- the company becomes dependent on the memory of one operator

So the migration mattered because it pulled the edge layer back into a shape where the business can keep moving without relying on click ops.

## What This Unlocks Beyond Terraform

Another reason this mattered: Cloudflare is pushing hard beyond classic CDN and DNS use cases. The developer platform is now a real product surface, not just a side feature.

If the provider layer is outdated or half-managed manually, it becomes harder to adopt what Cloudflare is actually investing in.

The migration helps keep the company ready for things like:

- Workers AI for running inference on Cloudflare's network
- AI Gateway for observability, caching, retries, rate limiting, and fallback for AI applications
- Vectorize for vector search and retrieval workloads
- Durable Objects for stateful coordination and real-time systems
- Agents SDK for stateful agents with scheduling, tools, and human-in-the-loop flows
- Hyperdrive for connecting Workers to existing regional databases with better global performance

That matters because it keeps the path open for building application features on the same platform that already sits in front of production traffic.

For a small company, that is leverage.

The value is not "we upgraded Terraform". The value is "we are in a position to adopt new platform capabilities without first untangling old infra debt."

## Start with a Read-Only Cloudflare Token

The next part is authentication.

I provide a read-only Cloudflare token first. Not admin. Not "temporary full access". Just enough access to inspect what already exists.

Just read-only.

Why?

- I want discovery first
- provider refresh and plan usually need API reads
- I want to inspect what exists before allowing any write path
- if the token leaks somewhere, the blast radius is much smaller

For this stage I only need visibility into the objects already managed by Terraform: zones, DNS, rulesets, WAF objects, Zero Trust resources, and similar things depending on the stack.

If later I need write access, I switch credentials only after the diff is reviewed by a human. Discovery and API lookups use the read-only token. Mutation happens in a separate reviewed phase.

## Keep the Authentication Boring

I prefer the auth model to be boring and explicit. Usually it is just:

```bash
export CLOUDFLARE_API_TOKEN="..."
```

The token should come from a proper secret source:

- local secret manager
- CI secret store
- short-lived shell session

And it should not be committed to the repo or copied into prompt text.

This is not the place to be creative.

## The Migration Plan I Followed

My notes ended up being very close to this sequence:

1. upgrade from `4.52.0` to `4.52.5` first
2. run `tf-migrate` in dry-run mode
3. apply the HCL rewrite and review every changed `.tf` file
4. fix renamed or removed resources manually where needed
5. switch to provider `~> 5`
6. repair state issues
7. apply in `test` first
8. only after that touch `prod`

I also prefer to split the work into several PRs:

- PR1: transitional provider upgrade to `4.52.5`
- PR2: `tf-migrate` HCL rewrite plus manual fixes
- PR3: provider `~> 5`, state cleanup, import flow, final validation

This keeps the diff readable, makes CI output easier to understand, and gives engineers a cleaner checkpoint after every phase.

It also reduces delivery risk. When one phase goes wrong, I know exactly where to stop, revert, or re-plan instead of carrying one giant migration diff through the whole stack.

## Pull State and Work Locally First

One practical lesson from this migration: I do not want to start by experimenting against the normal backend.

First I pull the state locally and prepare a local work mode:

```bash
./tf.sh state pull > migration.tfstate
cp <environment>.tfvars <environment>.auto.tfvars
```

I copy the environment tfvars to `*.auto.tfvars` simply to make local Terraform runs load the same environment-specific values without adding extra flags to every command.

Then I temporarily switch the backend:

```hcl
terraform {
  backend "local" {
    path = "migration.tfstate"
  }
}
```

This part matters a lot.

The moment I know state cleanup, imports, and provider schema upgrades may be involved, I want a local copy first. It gives me a safer place to inspect, test, and understand the damage before touching the normal backend flow.

It also gives a faster feedback loop. That matters because this migration is not one command. It is many small iterations.

One obvious warning here: local Terraform state may contain secrets. Treat that local file accordingly.

## Dry Run the Migration Tool First

Before changing the provider constraint, I run the migration tool in dry-run mode:

```bash
tf-migrate migrate --source-version v4 --target-version v5 --dry-run --config-dir .
```

And the warnings are the interesting part.

In this migration, the dry run showed exactly where manual work was still required. The main categories were:

- application-scoped Access policies
- removed resources in `v5`
- resources that would need state cleanup and re-import

That is already a very good result. The tool does not need to finish the migration. It just needs to show where engineers should spend manual review time.

## What `tf-migrate` Did Not Finish

A few warnings from the dry run were especially important.

Application-scoped Access policies could not be migrated automatically. In `v5`, those policies need to live inline inside `cloudflare_zero_trust_access_application`.

`cloudflare_split_tunnel` was removed and had to move to device profile configuration.

`cloudflare_zone_settings_override` was removed too. The migration generated per-setting resources, but the old state still had to be removed and the new resources had to be imported correctly.

There were also a few field-level changes. For example, `min_days_for_renewal` disappeared from origin CA certificate resources.

This is why I treat the migration tool output as the first pass, not as the final migration.

## Expect Manual State Cleanup

This migration is not only about renaming resources.

Some failures happen because the old state payload cannot be decoded correctly by the `v5` provider. So Terraform fails before you even get a useful diff.

I saw this pattern on resources such as:

- Zero Trust gateway policies
- load balancer monitors
- zones and zone settings related objects

The errors looked like provider decode problems, for example:

- `rule_settings`: expected object, got array
- `header`: expected object, got array
- `plan`: expected object, got string

When that happens, the path is usually:

1. back up the state
2. remove only the failing addresses from state
3. import them again with the `v5` format
4. re-run plan

This is one more reason why I like the local backend step first. It gives engineers room to repair state deliberately instead of rushing through it.

## Some Cloudflare Resources Need Manual Review Anyway

The migration tool helps a lot, but some resources still need human attention.

The ones I would watch first are:

- Access policies attached to applications
- split tunnel configuration
- zone settings overrides
- rulesets
- load balancer resources

For example, application Access policies are not just a rename problem. In `v5`, some of them need to move into inline policies on the application resource. That is not something I want to trust to an automatic rewrite without review.

Zone settings are another good example. Old override-style resources may turn into many per-setting resources. That often means imports and explicit state cleanup, not just HCL edits.

## Use Small Shell Scripts as Migration Helpers

One thing that helped a lot was using small disposable shell scripts instead of trying to remember every `state rm` and every import format.

I would recommend this to anyone doing the same migration.

Not because the scripts are fancy. The opposite. They are boring, explicit, and easy to review.

I ended up with three useful categories of scripts:

1. scripts that remove stale state entries
2. scripts that import renamed resources back into state
3. scripts that query Cloudflare API and match objects automatically

All IDs in the examples below are dummy values. They are here only to show the expected shape.

## Example 1: Bulk State Cleanup Script

For resources that obviously had to be removed from legacy state, I prefer a helper like this:

```bash
#!/usr/bin/env bash
set -euo pipefail

terraform state rm \
  'cloudflare_ruleset.example' \
  'cloudflare_access_policy.example' \
  'cloudflare_zone_settings_override.this' \
  'cloudflare_worker_domain.this' \
  'cloudflare_tunnel_virtual_network.default'
```

This is much safer than typing a long list manually while you are tired and already many plans deep into the migration.

## Example 2: Environment-Aware Import Script

For resources that exist in all environments but have different IDs, I like a small wrapper that detects the AWS account and chooses the correct import ID.

```bash
#!/usr/bin/env bash
set -euo pipefail

ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"

case "$ACCOUNT_ID" in
  "123456789012")
    ENV_NAME="test"
    CLOUDFLARE_ACCOUNT_ID="7f3c5a0b1d4e6f8899aabbccddeeff00"
    RESOURCE_ID="2c9d4a8f7b6e5d4c3b2a1908fedcba76"
    ;;
  "210987654321")
    ENV_NAME="prod"
    CLOUDFLARE_ACCOUNT_ID="7f3c5a0b1d4e6f8899aabbccddeeff00"
    RESOURCE_ID="8a7b6c5d4e3f2109fedcba9876543210"
    ;;
  *)
    echo "Unsupported account: $ACCOUNT_ID"
    exit 1
    ;;
esac

echo "Detected environment: $ENV_NAME"
terraform import cloudflare_load_balancer_monitor.default "$CLOUDFLARE_ACCOUNT_ID/$RESOURCE_ID"
```

That pattern was useful for load balancer monitor, pools, load balancer, worker domains, and a few other resources.

## Example 3: Parse Plan and Re-Import Existing Rulesets

Rulesets were more interesting.

Sometimes Terraform wanted to create a ruleset that already existed in Cloudflare. In that case, I do not want to import by hand one by one if the plan already contains enough metadata to identify the object.

So another useful helper script pattern is:

1. read `plan.txt`
2. find `cloudflare_ruleset.* will be created`
3. extract zone ID, name, phase, and description
4. call Cloudflare API
5. resolve the matching ruleset ID
6. run `terraform import`

Very rough shape:

```bash
#!/usr/bin/env bash
set -euo pipefail

PLAN_FILE="${1:-plan.txt}"
ZONE_ID="f1e2d3c4b5a697887766554433221100"
RULESET_ID="9b8a7c6d5e4f32100123456789abcdef"

# parse plan output here
# call Cloudflare API here
# match by zone_id + name + phase
# terraform import "cloudflare_ruleset.example" "zones/$ZONE_ID/$RULESET_ID"
```

This is one of those places where automation actually saves time instead of adding risk.

More importantly, it reduces team dependency.

Without these scripts, the migration would live mostly in one engineer's memory. With them, another engineer can follow the same sequence, understand the shape of the repair work, and repeat it without reverse-engineering the whole stack from scratch.

## Put Tools Behind Guardrails

I still use code assistants for this kind of work. They are useful for scanning many `.tf` files, detecting renamed resources, summarizing warnings, and preparing repetitive edits.

But I keep the boundary simple:

- read the repository
- read provider docs
- prepare code changes
- summarize migration warnings
- never apply infrastructure changes on its own

If you use Cursor, `beforeShellExecution` is one of the easiest controls to add. We use it as a deny layer before the command is executed.

```json
"beforeShellExecution": [
  {
    "command": "~/.cursor/hooks/block-apply.sh"
  }
]
```

Very small hook:

```bash
#!/usr/bin/env bash
set -euo pipefail

input="$(cat 2>/dev/null || echo '{}')"
cmd="$(echo "$input" | jq -r '.command // .cmd // .shell_command // empty')"

case "$cmd" in
  *"terraform apply"*|*"terraform destroy"*|*"terraform import"*|*"terraform state rm"*|*"terraform state mv"*|*"auto-approve"*)
    echo "Blocked by policy during migration window: $cmd" >&2
    exit 2
    ;;
esac
```

That was enough for the first stage. The tool could still scan modules, compare `v4` and `v5` resources, prepare refactors, and produce review notes, but it could not jump straight to mutation.

Important detail: this deny policy is for the assistant-driven exploration and diff-preparation phase.

Later, once the review is done, I run the approved `terraform import`, `terraform state rm`, and `terraform apply` steps myself in a separate supervised shell session. The hook is there to stop premature mutation, not to ban the whole migration workflow forever.

## What Helped the Most

If I reduce the whole experience to a few practical points, these are the things that helped most:

- use `test` first and keep `prod` behind review
- run `tf-migrate` in dry-run mode before changing provider version
- expect some `state rm` plus `terraform import` work
- keep helper scripts for repetitive import flows
- keep CI plans running during every phase
- remove `-auto-approve` for the migration window

None of this is complicated, but together it makes the migration much easier to pass.

It also lowers maintenance cost later. Repeated state repair or import logic stops being a custom one-time ritual and starts becoming documented operational tooling.

## The First Phase Workflow

Before any write-capable step, I want the work loop to be very small:

1. read the Terraform code
2. run the dry-run migration
3. prepare code changes
4. run safe checks like `terraform fmt` and `terraform validate`
5. prepare import and state-cleanup commands for review
6. review the diff

The first goal is not to "finish the migration". The first goal is to remove uncertainty.

## Test Before Prod, and Keep CI Running

Another useful note from the migration: `test` first, always.

My rollout rule was:

1. migrate and apply in `test`
2. make sure post-apply plan is clean
3. keep CI planning both `test` and `prod`
4. only then allow the `prod` path

During migration, temporary `prod` plan instability can happen because of intermediate rename and state steps. That is acceptable for a short period. Blind `prod` apply is not.

Also, I keep `-auto-approve` out of the flow completely for this migration window.

## Result

The upgrade path was not fully automatic, but the combination of dry-run migration, local state work, scripted imports, and staged rollout made it predictable enough to execute safely.

That was the real objective. Not to make the migration look elegant, but to make it pass without breaking production and without turning one engineer's memory into the only runbook.

For me this is the fun part of infrastructure work. I am comfortable owning technical risk when the process is clear and the rollback path is real.

From different angles, the result was:

- better path to adopt newer Cloudflare platform capabilities
- less dependence on click ops and one-person memory
- safer rollout shape for a production edge migration
- clearer signal that this environment can be maintained by another engineer later

## Define the Reset Path Before You Need It

One more practical point: define the way back before the first apply.

After local repair work, I want an explicit reset path:

- restore the normal backend block
- reinitialize Terraform
- migrate backend metadata if needed
- remove temporary files like copied tfvars, scratch plans, and local state artifacts

If you skip this part, the migration gets messy very quickly. Temporary files pile up, people forget which plan is the latest one, and it becomes much easier to make a bad decision.

## Final Thoughts

For me, the hard part of this migration is not HCL rewrite. The hard part is keeping the process predictable enough that other engineers can follow it too.

So the rule is simple:

- read-only first
- local state first
- dry-run first
- small helper scripts instead of manual repetition
- human review before mutation

That is how I prefer to start a Cloudflare provider `v4` to `v5` migration and help the next engineer get through it faster.
