Chat, files, and Terraform — building AI on Azure

On 14 January I spoke at SysOps/DevOps Wrocław MeetUp #25 — Chat, files, and Terraform: behind the scenes of building an AI solution in Azure. Polish talk, no public repo to clone — this post is the English write-up. Recording on YouTube.

The project was a file-aware chat: upload internal documents, ask questions, get answers grounded in those files. We built it twice — first as classic RAG on Azure OpenAI plus Azure AI Search, then as an Azure AI Foundry Agent. The second path was better for file handling; the first path was easier to reason about in Terraform. Platform engineers will recognize the pattern: the product story moves faster than the IaC provider story.

The first version was deliberately boring architecture. Azure OpenAI for the model, Azure AI Search for indexes, indexers, and skillsets, an application layer that sent retrieved chunks into the prompt instead of whole documents every time. Storage for source files, RBAC so keys never reached the browser, orchestration on the server.

That stack is well documented. You can prototype prompts in OpenAI Studio, prove retrieval in Search, then wire the same shape into application code. Terraform coverage for the Azure side — OpenAI account, Search service, storage, networking — is mature enough that a platform team can treat it like any other Azure workload.

Where it got interesting was not the RAG loop itself but the operational edges: index refresh when files change, chunk size and overlap tuning, and making sure the isolation model in the app matched what Search actually indexed. Those are product decisions that show up in Terraform as variables and pipeline steps, not as a single azurerm_* resource.

We wanted repeatability: dev, test, and a path toward production without clicking through three portals. AzureRM handles the foundation — resource group, OpenAI, Search, storage, private endpoints where required. That part felt familiar if you have already provisioned data-plane services on Azure.

The friction was in the AI-specific glue. Some resources changed shape between API versions while we were building. Documentation examples often assume portal-first setup; translating that into modules took longer than the Azure OpenAI resource itself. Dependencies between Search indexers and storage permissions were easy to get wrong on the first apply — the indexer fails with a generic error until the managed identity has the right role on the container.

My practical takeaway for platform teams: split Terraform into layers. Foundation (network, identity, core AI services) in one state or module set; Search indexes, skillsets, and deployment-specific prompt configuration in another, applied after the data plane exists. Trying to stand everything up in one monolithic apply works in a demo and hurts in a team where someone else owns the index schema.

Also worth documenting explicitly: which identity creates scopes, which identity runs indexers, and which identity the application uses at runtime. When those differ, audit logs and Key Vault access policies need to match reality — same lesson I hit later on Databricks secret scopes, different service.

Microsoft’s direction toward Azure AI Foundry — agents, unified project workspace, file tools built in — matched what we wanted next. Upload files, attach them to an agent, let the platform handle tool routing instead of hand-rolling every retrieval step in application code.

For file Q&A, the Agent path was genuinely nicer. Less custom glue between storage, chunking, and prompt assembly. The product UX moved forward.

The integration and automation side was harder than the docs implied when I gave the talk. Portal flows that take ten minutes do not always map cleanly to Terraform or CI/CD on the first try. Agent definitions, project connections, and model deployments span resources that evolved quickly in 2025 and early 2026 — provider coverage and examples lagged the UI. I spent time on workarounds: partial automation in Terraform, partial bootstrap in scripts, and honest README notes about what still required manual steps.

That is not an argument against Foundry. It is the same gap you see whenever a platform moves from “try it in the studio” to “run it like infrastructure.” Ignite later folded more of this under Microsoft Foundry branding; the underlying lesson stays the same — plan for a hybrid phase where agents are production-ready before your modules are.

Classic RAG (OpenAI + Search) Foundry Agent
Terraform story Stronger — familiar Azure resources Catching up — more moving parts
File handling You own chunking, indexing, refresh Platform tools reduce custom code
Debuggability Every step is yours to trace Abstraction helps until it does not
Team skills App dev + Search + OpenAI Foundry concepts + still some IaC

If you are a platform engineer supporting app teams, I would still start with RAG when the goal is learning and control — especially if Terraform and RBAC are non-negotiable from day one. Move toward Agents when the product need outgrows hand-rolled retrieval and you can budget time for automation gaps.

If you are the app team and IaC is someone else’s problem, Agents may win earlier. The meetup room split roughly that way in the Q&A.

  • Automate identity first. Service principals, RBAC assignments, and Key Vault access before any index or agent exists.
  • Treat index and agent config as data, not as one-off portal clicks — even if the first version is a pipeline step running Azure CLI rather than pure Terraform.
  • Read entitlement and quota docs before a pilot becomes someone’s daily tool — same lesson as the Serverless Wrocław Azure OpenAI talk, where a silent trial expiry stopped the app cold.
  • Compare against Copilot Studio when the audience is internal and Entra-backed access is enough — not every file chat needs custom code or Foundry.

The arc I wanted to leave the room with: building AI on Azure is not only a model choice. It is identity, storage, retrieval, deployment automation, and knowing which layer Microsoft will abstract next. Foundry Agents are the direction; Terraform and provider maturity are still catching up. Plan for both.