Most tutorials on using OpenAI with Laravel stop at "install the package and call the API." That gets you a proof of concept but not a production application. A real integration needs proper configuration management, a service class you can test and swap, streaming support so users aren't staring at a spinner for fifteen seconds, and sensible handling of token limits and rate errors.
This guide walks through building all of that from scratch. By the end you'll have a clean abstraction layer that works with OpenAI today and can swap to Anthropic or a local model tomorrow without touching your application code.
Setting Up API Keys Properly
The first rule of Laravel configuration applies here as much as anywhere: never use env() outside of config files. Create a dedicated config file for your AI integration so you have a single place to manage keys, default models, and timeouts.
// config/ai.php
return [
'default' => env('AI_PROVIDER', 'openai'),
'providers' => [
'openai' => [
'api_key' => env('OPENAI_API_KEY'),
'organization' => env('OPENAI_ORGANIZATION'),
'base_url' => env('OPENAI_BASE_URL', 'https://api.openai.com/v1'),
'model' => env('OPENAI_MODEL', 'gpt-4o'),
'max_tokens' => env('OPENAI_MAX_TOKENS', 4096),
'timeout' => env('OPENAI_TIMEOUT', 30),
],
'anthropic' => [
'api_key' => env('ANTHROPIC_API_KEY'),
'base_url' => env('ANTHROPIC_BASE_URL', 'https://api.anthropic.com/v1'),
'model' => env('ANTHROPIC_MODEL', 'claude-sonnet-4-20250514'),
'max_tokens' => env('ANTHROPIC_MAX_TOKENS', 4096),
'timeout' => env('ANTHROPIC_TIMEOUT', 30),
],
],
];
Add your keys to .env and .env.example:
AI_PROVIDER=openai
OPENAI_API_KEY=sk-your-key-here
OPENAI_ORGANIZATION=org-optional
OPENAI_MODEL=gpt-4o
OPENAI_MAX_TOKENS=4096
This structure makes it trivial to switch providers via a single environment variable and to override model settings per environment. Your staging server can use a cheaper model while production uses the flagship.
Defining the Contract
Before writing any HTTP calls, define the interface your application code will depend on. This is the key to making your AI layer swappable.
// app/Contracts/AiProvider.php
namespace App\Contracts;
use App\DataTransferObjects\AiRequest;
use App\DataTransferObjects\AiResponse;
interface AiProvider
{
public function complete(AiRequest $request): AiResponse;
/** @return \Generator<string> */
public function stream(AiRequest $request): \Generator;
}
And the DTOs that carry data between your application and the provider layer:
// app/DataTransferObjects/AiRequest.php
namespace App\DataTransferObjects;
class AiRequest
{
public function __construct(
public readonly string $prompt,
public readonly ?string $systemMessage = null,
public readonly ?string $model = null,
public readonly ?int $maxTokens = null,
public readonly float $temperature = 0.7,
/** @var array<int, array{role: string, content: string}> */
public readonly array $history = [],
) {}
}
// app/DataTransferObjects/AiResponse.php
namespace App\DataTransferObjects;
class AiResponse
{
public function __construct(
public readonly string $content,
public readonly int $promptTokens,
public readonly int $completionTokens,
public readonly ?string $finishReason = null,
) {}
public function totalTokens(): int
{
return $this->promptTokens + $this->completionTokens;
}
}
Your controllers and services never touch HTTP clients or API-specific payloads. They work with AiRequest and AiResponse objects, and the provider handles the translation.
Building the OpenAI Provider
Now implement the interface for OpenAI. This class handles building the request payload, sending it via Http, and parsing the response into your DTOs.
// app/Services/Ai/OpenAiProvider.php
namespace App\Services\Ai;
use App\Contracts\AiProvider;
use App\DataTransferObjects\AiRequest;
use App\DataTransferObjects\AiResponse;
use App\Exceptions\AiProviderException;
use Illuminate\Http\Client\ConnectionException;
use Illuminate\Http\Client\RequestException;
use Illuminate\Support\Facades\Http;
class OpenAiProvider implements AiProvider
{
public function __construct(
private readonly string $apiKey,
private readonly string $baseUrl,
private readonly string $defaultModel,
private readonly int $defaultMaxTokens,
private readonly int $timeout,
private readonly ?string $organization = null,
) {}
public function complete(AiRequest $request): AiResponse
{
try {
$response = $this->client()
->timeout($this->timeout)
->post('/chat/completions', $this->buildPayload($request));
$response->throw();
$data = $response->json();
return new AiResponse(
content: $data['choices'][0]['message']['content'],
promptTokens: $data['usage']['prompt_tokens'],
completionTokens: $data['usage']['completion_tokens'],
finishReason: $data['choices'][0]['finish_reason'],
);
} catch (RequestException $e) {
throw AiProviderException::fromResponse(
provider: 'openai',
status: $e->response->status(),
body: $e->response->json(),
);
} catch (ConnectionException $e) {
throw AiProviderException::connectionFailed('openai', $e);
}
}
/** @return \Generator<string> */
public function stream(AiRequest $request): \Generator
{
$payload = $this->buildPayload($request);
$payload['stream'] = true;
$response = $this->client()
->timeout($this->timeout)
->withOptions(['stream' => true])
->post('/chat/completions', $payload);
$body = $response->getBody();
$buffer = '';
while (! $body->eof()) {
$buffer .= $body->read(512);
$lines = explode("\n", $buffer);
$buffer = array_pop($lines);
foreach ($lines as $line) {
$line = trim($line);
if (! str_starts_with($line, 'data: ')) {
continue;
}
$data = substr($line, 6);
if ($data === '[DONE]') {
return;
}
$json = json_decode($data, true);
$delta = $json['choices'][0]['delta']['content'] ?? '';
if ($delta !== '') {
yield $delta;
}
}
}
}
/** @return array<string, mixed> */
private function buildPayload(AiRequest $request): array
{
$messages = [];
if ($request->systemMessage) {
$messages[] = [
'role' => 'system',
'content' => $request->systemMessage,
];
}
foreach ($request->history as $message) {
$messages[] = $message;
}
$messages[] = [
'role' => 'user',
'content' => $request->prompt,
];
return [
'model' => $request->model ?? $this->defaultModel,
'messages' => $messages,
'max_tokens' => $request->maxTokens ?? $this->defaultMaxTokens,
'temperature' => $request->temperature,
];
}
private function client(): \Illuminate\Http\Client\PendingRequest
{
$client = Http::baseUrl($this->baseUrl)
->withToken($this->apiKey)
->acceptJson();
if ($this->organization) {
$client->withHeader('OpenAI-Organization', $this->organization);
}
return $client;
}
}
A few things worth calling out. The streaming implementation reads chunks from the raw PSR-7 body stream and yields content deltas as they arrive. The buildPayload method centralizes message construction so both complete and stream use the same logic. And all API errors get wrapped in a domain exception so your calling code doesn't need to know about HTTP status codes.
The Custom Exception
A dedicated exception class gives you structured error information without coupling to any specific provider:
// app/Exceptions/AiProviderException.php
namespace App\Exceptions;
use Illuminate\Http\Client\ConnectionException;
class AiProviderException extends \RuntimeException
{
public function __construct(
string $message,
public readonly string $provider,
public readonly ?int $statusCode = null,
public readonly ?string $errorType = null,
?\Throwable $previous = null,
) {
parent::__construct($message, 0, $previous);
}
/** @param array<string, mixed> $body */
public static function fromResponse(string $provider, int $status, array $body): self
{
$message = $body['error']['message'] ?? 'Unknown API error';
$type = $body['error']['type'] ?? null;
return new self(
message: "[{$provider}] {$message}",
provider: $provider,
statusCode: $status,
errorType: $type,
);
}
public static function connectionFailed(string $provider, ConnectionException $e): self
{
return new self(
message: "[{$provider}] Connection failed: {$e->getMessage()}",
provider: $provider,
previous: $e,
);
}
public function isRateLimited(): bool
{
return $this->statusCode === 429;
}
public function isServerError(): bool
{
return $this->statusCode !== null && $this->statusCode >= 500;
}
}
Wiring It Up with the Service Container
Register the provider in a service provider so Laravel resolves the correct implementation everywhere:
// app/Providers/AiServiceProvider.php
namespace App\Providers;
use App\Contracts\AiProvider;
use App\Services\Ai\OpenAiProvider;
use Illuminate\Support\ServiceProvider;
class AiServiceProvider extends ServiceProvider
{
public function register(): void
{
$this->app->singleton(AiProvider::class, function ($app) {
$provider = config('ai.default');
$config = config("ai.providers.{$provider}");
return match ($provider) {
'openai' => new OpenAiProvider(
apiKey: $config['api_key'],
baseUrl: $config['base_url'],
defaultModel: $config['model'],
defaultMaxTokens: $config['max_tokens'],
timeout: $config['timeout'],
organization: $config['organization'] ?? null,
),
default => throw new \InvalidArgumentException("Unsupported AI provider: {$provider}"),
};
});
}
}
Register the provider in bootstrap/providers.php. Now anywhere in your application, you can type-hint AiProvider and get the configured implementation. When you add Anthropic support later, you add a case to the match expression and switch the environment variable. Zero changes to calling code.
Handling Streaming Responses in a Controller
Streaming is where the user experience goes from "unusable" to "delightful." Instead of waiting for the entire response to generate, you can push tokens to the browser as they arrive using Server-Sent Events.
// app/Http/Controllers/ChatController.php
namespace App\Http\Controllers;
use App\Contracts\AiProvider;
use App\DataTransferObjects\AiRequest;
use App\Exceptions\AiProviderException;
use App\Http\Requests\ChatRequest;
use Illuminate\Http\Request;
use Symfony\Component\HttpFoundation\StreamedResponse;
class ChatController extends Controller
{
public function __construct(
private readonly AiProvider $ai,
) {}
public function stream(ChatRequest $request): StreamedResponse
{
$aiRequest = new AiRequest(
prompt: $request->validated('message'),
systemMessage: 'You are a helpful assistant.',
history: $request->validated('history', []),
);
return response()->stream(function () use ($aiRequest) {
try {
foreach ($this->ai->stream($aiRequest) as $chunk) {
echo "data: " . json_encode(['content' => $chunk]) . "\n\n";
ob_flush();
flush();
}
echo "data: [DONE]\n\n";
ob_flush();
flush();
} catch (AiProviderException $e) {
echo "data: " . json_encode(['error' => $e->getMessage()]) . "\n\n";
ob_flush();
flush();
}
}, 200, [
'Content-Type' => 'text/event-stream',
'Cache-Control' => 'no-cache',
'Connection' => 'keep-alive',
'X-Accel-Buffering' => 'no',
]);
}
}
The X-Accel-Buffering header is critical if you're behind Nginx — without it, Nginx buffers the entire response and defeats the purpose of streaming. On the frontend, you consume this with the EventSource API or a fetch call reading the response body as a stream.
Managing Token Limits
Token limits are the quiet source of most production bugs in AI integrations. I learned this one the hard way — you hit the context window, the API returns an error, and your user sees a generic failure message. Build token estimation into your request pipeline.
// app/Services/Ai/TokenEstimator.php
namespace App\Services\Ai;
class TokenEstimator
{
/**
* Rough estimation: ~4 characters per token for English text.
* For precise counts, use tiktoken via a package like yethee/tiktoken.
*/
public function estimate(string $text): int
{
return (int) ceil(mb_strlen($text) / 4);
}
/**
* @param array<int, array{role: string, content: string}> $messages
*/
public function estimateMessages(array $messages): int
{
$tokens = 0;
foreach ($messages as $message) {
// Each message has ~4 tokens of overhead (role, delimiters)
$tokens += 4;
$tokens += $this->estimate($message['content']);
}
// Every reply is primed with 3 tokens
$tokens += 3;
return $tokens;
}
/**
* @param array<int, array{role: string, content: string}> $messages
* @return array<int, array{role: string, content: string}>
*/
public function trimHistory(array $messages, int $maxTokens, int $reserveForResponse = 1024): array
{
$budget = $maxTokens - $reserveForResponse;
$result = [];
$used = 3; // base overhead
// Always keep the system message if present
// Then fill from the most recent messages backward
$reversed = array_reverse($messages);
foreach ($reversed as $message) {
$cost = 4 + $this->estimate($message['content']);
if ($used + $cost > $budget) {
break;
}
$used += $cost;
array_unshift($result, $message);
}
return $result;
}
}
The trimHistory method is the critical piece. In a chat application, conversation history grows with every exchange. Without trimming, you'll eventually exceed the context window. This method keeps the most recent messages that fit within your budget and silently drops older ones — the same strategy ChatGPT itself uses.
Retry Logic and Rate Limiting
OpenAI returns 429 errors when you exceed rate limits and 500 errors when their servers are overloaded. Both are transient failures that should be retried. Laravel HTTP client has built-in retry support:
private function client(): \Illuminate\Http\Client\PendingRequest
{
$client = Http::baseUrl($this->baseUrl)
->withToken($this->apiKey)
->acceptJson()
->retry(
times: 3,
sleepMilliseconds: fn (int $attempt) => $attempt * 1000,
when: fn ($exception) => $exception instanceof RequestException
&& in_array($exception->response->status(), [429, 500, 503]),
throw: true,
);
if ($this->organization) {
$client->withHeader('OpenAI-Organization', $this->organization);
}
return $client;
}
This retries up to three times with linear backoff, but only for rate limit and server errors. Client errors like 401 (bad API key) or 400 (malformed request) fail immediately. You don't want to retry those — they'll never succeed.
Adding a Second Provider
The payoff of the interface approach shows up when you add a second provider. Here's a minimal Anthropic implementation:
// app/Services/Ai/AnthropicProvider.php
namespace App\Services\Ai;
use App\Contracts\AiProvider;
use App\DataTransferObjects\AiRequest;
use App\DataTransferObjects\AiResponse;
use Illuminate\Support\Facades\Http;
class AnthropicProvider implements AiProvider
{
public function __construct(
private readonly string $apiKey,
private readonly string $baseUrl,
private readonly string $defaultModel,
private readonly int $defaultMaxTokens,
private readonly int $timeout,
) {}
public function complete(AiRequest $request): AiResponse
{
$response = $this->client()
->timeout($this->timeout)
->post('/messages', $this->buildPayload($request));
$response->throw();
$data = $response->json();
return new AiResponse(
content: $data['content'][0]['text'],
promptTokens: $data['usage']['input_tokens'],
completionTokens: $data['usage']['output_tokens'],
finishReason: $data['stop_reason'],
);
}
/** @return \Generator<string> */
public function stream(AiRequest $request): \Generator
{
$payload = $this->buildPayload($request);
$payload['stream'] = true;
$response = $this->client()
->timeout($this->timeout)
->withOptions(['stream' => true])
->post('/messages', $payload);
$body = $response->getBody();
$buffer = '';
while (! $body->eof()) {
$buffer .= $body->read(512);
$lines = explode("\n", $buffer);
$buffer = array_pop($lines);
foreach ($lines as $line) {
$line = trim($line);
if (! str_starts_with($line, 'data: ')) {
continue;
}
$json = json_decode(substr($line, 6), true);
if (($json['type'] ?? '') === 'content_block_delta') {
yield $json['delta']['text'] ?? '';
}
}
}
}
/** @return array<string, mixed> */
private function buildPayload(AiRequest $request): array
{
$payload = [
'model' => $request->model ?? $this->defaultModel,
'max_tokens' => $request->maxTokens ?? $this->defaultMaxTokens,
'messages' => [],
];
if ($request->systemMessage) {
$payload['system'] = $request->systemMessage;
}
foreach ($request->history as $message) {
$payload['messages'][] = $message;
}
$payload['messages'][] = [
'role' => 'user',
'content' => $request->prompt,
];
return $payload;
}
private function client(): \Illuminate\Http\Client\PendingRequest
{
return Http::baseUrl($this->baseUrl)
->withHeader('x-api-key', $this->apiKey)
->withHeader('anthropic-version', '2023-06-01')
->acceptJson();
}
}
Notice how different the Anthropic API is under the hood. The system message goes in a top-level system field instead of as a message. Authentication uses x-api-key instead of a Bearer token. Streaming events have a completely different structure. But your controller doesn't care — it calls $this->ai->stream() and gets the same interface either way.
Update the service provider to handle both:
return match ($provider) {
'openai' => new OpenAiProvider(
apiKey: $config['api_key'],
baseUrl: $config['base_url'],
defaultModel: $config['model'],
defaultMaxTokens: $config['max_tokens'],
timeout: $config['timeout'],
organization: $config['organization'] ?? null,
),
'anthropic' => new AnthropicProvider(
apiKey: $config['api_key'],
baseUrl: $config['base_url'],
defaultModel: $config['model'],
defaultMaxTokens: $config['max_tokens'],
timeout: $config['timeout'],
),
default => throw new \InvalidArgumentException("Unsupported AI provider: {$provider}"),
};
Testing with Fakes
The interface makes testing straightforward. Create a fake implementation for your test suite:
// tests/Fakes/FakeAiProvider.php
namespace Tests\Fakes;
use App\Contracts\AiProvider;
use App\DataTransferObjects\AiRequest;
use App\DataTransferObjects\AiResponse;
class FakeAiProvider implements AiProvider
{
/** @var array<int, AiRequest> */
public array $requests = [];
public function __construct(
private readonly string $response = 'Fake AI response',
) {}
public function complete(AiRequest $request): AiResponse
{
$this->requests[] = $request;
return new AiResponse(
content: $this->response,
promptTokens: 10,
completionTokens: 20,
);
}
/** @return \Generator<string> */
public function stream(AiRequest $request): \Generator
{
$this->requests[] = $request;
$words = explode(' ', $this->response);
foreach ($words as $word) {
yield $word . ' ';
}
}
public function assertRequestCount(int $count): void
{
assert(count($this->requests) === $count);
}
}
In your tests, bind the fake in the container:
public function test_chat_endpoint_returns_streamed_response(): void
{
$fake = new FakeAiProvider('Hello from the AI');
$this->app->instance(AiProvider::class, $fake);
$response = $this->postJson('/chat/stream', [
'message' => 'What is Laravel?',
]);
$response->assertOk();
$fake->assertRequestCount(1);
}
No HTTP calls, no API keys needed, fast and deterministic. You can also use Http::fake() to test the actual provider implementation against fixture data if you want to verify the payload construction.
Production Considerations
A few things to get right before shipping this to users.
Log costs, not content. Track token usage per request so you can monitor spend and catch runaway costs. But don't log the actual prompt or response content unless you've thought carefully about privacy implications. User conversations with an AI may contain sensitive information.
Set hard limits. Use maxTokens in every request. Without it, a model can generate an arbitrarily long response and burn through your budget on a single API call. Set it to the minimum that makes sense for your use case.
Queue long-running requests. If you're doing batch processing — summarizing documents, generating reports, analyzing datasets — don't do it synchronously. Dispatch a job, store the result, and notify the user when it's ready. AI API calls can take 30 seconds or more, which is far too long for a web request.
Cache when appropriate. If the same prompt always produces the same result (temperature 0, deterministic use case), cache the response. There's no reason to pay for the same answer twice.
Conclusion
Everything here follows standard Laravel patterns you already know: an interface for abstraction, DTOs for data transport, a service provider for wiring, and dependency injection everywhere. The AI-specific concerns — streaming SSE responses, estimating tokens, trimming conversation history — sit inside the provider implementations where they belong.
Start with one provider, get your product working, and add the interface when you have a concrete reason to swap. I've shipped this exact pattern in two production apps now, and the abstraction cost me maybe an extra hour up front. It saved days when we switched from GPT-4 to Claude mid-project. AI APIs are just HTTP services at the end of the day, and Laravel already gives you every tool you need to integrate them cleanly.