Documentation: Azure OpenAI (AOAI)

This document provides further details and context for the Azure OpenAI (AOAI) section of the Azure Startup Checklist.

Review AOAI best practices
Why: Leveraging large language models (LLMs) effectively requires understanding best practices for prompt engineering, model selection, security, and responsible AI.
How: Familiarize yourself with Azure OpenAI documentation covering key concepts, model capabilities (GPT-4, GPT-3.5-Turbo, Embeddings, DALL-E), and recommended patterns.
Resources:
Follow guidance for using your own data with AOAI
Why: Retrieval-Augmented Generation (RAG) patterns allow you to ground LLM responses in your specific data, improving relevance and accuracy. Implementing this securely and effectively is crucial.
How: Explore Azure OpenAI’s “on your data” feature or implement custom RAG solutions using services like Azure AI Search to index your data and provide relevant context to the LLM during generation.
Resources:
- Azure OpenAI on your data
- Retrieval Augmented Generation (RAG) in Azure AI Search
Understand AOAI data processing and storage
Why: It’s essential to know how your prompts, completions, embeddings, and training data (if applicable for fine-tuning) are processed and stored by the Azure OpenAI service to meet compliance and privacy requirements.
How: Review the official Azure OpenAI data privacy and security documentation.
Resources:
- Data, privacy, and security for Azure OpenAI Service
Monitor AOAI data residency, concurrency, and cost
Why: As you scale your use of AOAI, these operational factors become critical.
How:
- Data Residency: Understand where your data is processed and stored based on the Azure region you deploy AOAI to.
- Concurrency: Monitor token usage (Prompt + Completion tokens) and manage quotas (Tokens-Per-Minute, Requests-Per-Minute) to ensure your application scales appropriately. Implement retry logic and potentially provisioned throughput for high-scale scenarios.
- Cost: Track token consumption closely as it directly impacts cost. Optimize prompts and leverage different models based on cost/performance trade-offs.
Resources: