Contents
- Why the choice is hard to reverse
- Match the choice to your cloud footprint
- The model menu, honestly compared
- Regions and sovereignty
- Identity, network, key control
- RAG-specific operational shape
- Pricing in practice, not theory
- The lock-in question
- A 10-minute decision framework
- Where ConvoSuite fits
If you are about to ship a retrieval-augmented generation (RAG) system at any non-trivial scale, the foundation-model layer is one of the two or three choices you will not be able to reverse cheaply. Models are sticky — not because of vendor lock-in in the abstract sense, but because every prompt, evaluation set, guardrail, and observability hook in your codebase quietly absorbs assumptions about the model's behaviour, latency profile, and pricing. Six months in, swapping is a project, not a config change.
This article is a side-by-side walk-through of the two enterprise-grade choices most ConvoSuite customers shortlist in 2026: AWS Bedrock and Azure OpenAI. Both are mature, both have credible SLAs, both expose first-class identity and network controls. They are not interchangeable. The right answer almost always comes down to four things: your existing cloud footprint, the regional sovereignty constraints you have to satisfy, the model families you want to mix, and how aggressively you want to be locked into a single provider's tooling roadmap.
1. Match the choice to your existing cloud footprint
The single largest cost driver in any RAG deployment is not inference. It is the data plumbing: where embeddings live, where the vector index runs, where logs land for retention, and how packets traverse VPCs. If your enterprise data already sits in S3, Redshift, Kinesis, and EKS, then doing retrieval from outside of AWS means cross-cloud egress charges on every query, plus a new privacy review for the egress destination. The reverse holds if your data is in ADLS Gen2, Synapse, and AKS.
For an honest comparison, do the napkin math: assume 50 KB of context per RAG call, 200 K calls a day, and an egress price near $0.09/GB. That alone is roughly $27 / day, or $9,800 / year, in cross-cloud egress — before you have paid for a single token. We have seen that number sink a "best-model-wins" decision more than once.
2. The model menu, honestly compared
Bedrock's strength is plurality. In a single account, behind a single IAM-mediated API, you get Anthropic Claude (Opus, Sonnet, Haiku tiers), Meta Llama, Mistral, AI21 Jamba, Cohere Command, Amazon's own Nova and Titan families, and an expanding catalogue of fine-tuned and distilled variants. For a RAG stack, the value of this plurality is concrete: you can route long-context analytical queries to Claude Opus, fast extraction queries to Claude Haiku, and embedding workloads to Titan or Cohere — all under one VPC, one set of logs, one billing line.
Azure OpenAI's strength is depth in a narrower catalogue. You get OpenAI's flagship models (GPT-4o, the GPT-5 family, o-series reasoners, DALL-E, Whisper) with the Microsoft enterprise wrapper around them: regional deployments, Private Link, customer-managed keys, and a content filter that Microsoft warrants. If "we standardise on OpenAI" is already a decision your CTO has made, Azure OpenAI removes most of the friction of operating that decision.
A practical rule of thumb: if your roadmap calls for any chance of multi-model routing in the next 18 months — for cost, for jurisdiction, for fallback — start on Bedrock. If your roadmap is a single-model bet on OpenAI and you want Microsoft's compliance umbrella, start on Azure OpenAI.
3. Regions and sovereignty
Both providers publish region matrices that, frustratingly, change every quarter. The pattern to watch is not just "does region X have model Y today" — it is "is the region pair I need for primary + DR both general availability, and have they been GA for at least one OS patch cycle?" Pre-GA regions ship with caveats around quota, content filters, and latency that should not be in your production critical path.
For European workloads with strict data-residency requirements, the boring choice in 2026 is still: Azure OpenAI in Sweden Central + West Europe (Netherlands) for OpenAI models, or Bedrock in eu-central-1 (Frankfurt) + eu-west-1 (Ireland) for Anthropic models. Both pairs have been through real disaster-recovery drills with paying customers. Avoid building your residency story on a region that only your sales rep has heard of.
4. Identity, network, and key control
Bedrock integrates with the wider AWS identity surface: IAM roles for service-to-service calls, IAM Identity Center for human access, KMS for customer-managed keys, VPC endpoints for private connectivity, and CloudTrail for an immutable audit log. Bedrock Guardrails layers content-safety policies on top, and the policies can be versioned and tested as code.
Azure OpenAI integrates with Entra ID (formerly Azure AD), customer-managed keys via Key Vault, Private Link for VNet-only routing, and the Microsoft Defender for Cloud signal stream. The content filter is Microsoft-managed by default, with a "ask for it" approval flow to relax categories where the use case requires it (legal discovery, security research, medical Q&A).
Neither is obviously better. The right question is "where does my existing identity, key, and network governance already live?" Doubling the number of identity providers your security team has to monitor is rarely a net win.
5. RAG-specific operational shape
A production RAG stack has roughly five moving parts: an embedding model, a vector store, a re-ranker, a generation model, and an evaluation harness. Bedrock has first-party stories for all five (Titan embeddings, OpenSearch Serverless or Aurora pgvector for the index, Cohere or Rerank-1 for re-ranking, Claude or Nova for generation, and Bedrock Evaluations for benchmarking). Azure OpenAI has first-party stories for two (embeddings, generation) and outsources the rest to Azure AI Search, Azure Cosmos DB, or partner stacks.
That sounds like a Bedrock win, but the catch is integration glue. Bedrock's first-party stack is wider but each piece is younger. Azure AI Search has been a battle-tested product for five years; you get hybrid search, scoring profiles, and faceted filters out of the box. If your differentiator is "we have a clever vector index" you will probably outgrow either default and bring your own; if it isn't, Azure AI Search is the path of least resistance.
6. Pricing in practice, not in theory
Both providers price by tokens. Both publish the per-million-token rate on a public page. Neither rate captures what you will actually spend. Real production cost is dominated by three line items that are not on the pricing page: context-window padding (the embeddings you stuff into the prompt that the model never references but you still pay for), retry storms (when a downstream tool times out and your agent retries the whole conversation), and idle provisioned throughput (capacity you reserved for SLA reasons and did not use overnight).
Bedrock has provisioned throughput, on-demand, and Batch Inference (which is roughly 50% cheaper for non-interactive jobs). Azure OpenAI has Provisioned Throughput Units (PTUs), pay-as-you-go, and the new "Batch" tier (50% off, 24-hour SLA). Both pricing surfaces reward you for separating interactive traffic from analytical jobs — do that separation early, in the application layer, and you can save 30–50% on inference without changing models.
7. The lock-in question
If you implement against the lowest-common-denominator API surface — chat-completions, embeddings, function-calling — switching providers later is a one-week project, not a one-quarter project. If you build deep against Bedrock Agents, Knowledge Bases, and Guardrails, or against Azure AI Foundry, Prompt Flow, and AI Search scoring profiles, you are buying convenience now and selling optionality later. Both are valid trades. The mistake is making the trade by accident.
ConvoSuite's reference deployments use a thin internal abstraction that lets the same agent run on either provider with a profile switch. We do not recommend this for every customer — the abstraction has a cost — but for any project that might cross a sovereignty boundary in its lifetime, it pays for itself the first time you have to migrate.
8. A 10-minute decision framework
If you remember nothing else from this article, run through these eight questions before signing an order form:
- Where does 80% of my data live today — AWS, Azure, or somewhere else? Start on the same cloud.
- Do I need more than one model family in production within 18 months? If yes, prefer Bedrock.
- Is OpenAI the single non-negotiable model brand for my stakeholders? If yes, prefer Azure OpenAI.
- What is my primary + DR region pair, and is it GA on the provider's roadmap today?
- Where does my identity, KMS, and network governance live?
- What is my embedding + vector-store choice? Does the provider's first-party option fit, or am I bringing my own?
- What share of traffic is analytical (can use batch tier) vs. interactive (cannot)?
- How portable is the abstraction layer I plan to write between my app and the model? Can I switch on a profile, or am I writing provider-specific business logic?
Nine out of ten teams that work through these questions end up with an obvious answer they had been talking around for weeks. The remaining ten percent are usually wrestling with a deeper organisational problem (no clear data owner, no agreed DR strategy) that no model choice is going to fix.
9. Where ConvoSuite fits
ConvoSuite ships on both AWS Marketplace and Azure Marketplace, with the same product surface on each. We do the abstraction work above — one configuration, run on Bedrock or Azure OpenAI — so that teams can pilot on whichever cloud has the data, and stay on whichever cloud is cheapest at scale. If you are at the start of a RAG project and still weighing the foundation, we are happy to do a free architecture call and tell you, honestly, which side of the fence we would put your workload on.