Skip to content

AI Integration

CruzJS provides AIService, an injectable facade for AI capabilities. The underlying provider is determined by the adapter — Workers AI on Cloudflare, Amazon Bedrock on AWS, Vertex AI on GCP, and OpenAI-compatible APIs on Azure, DigitalOcean, and Docker (including Ollama for local inference). All methods return typed results or null on failure — they never throw.

Configuration depends on your deployment adapter.

Set the following environment variables and enable the Workers AI binding:

CF_AI_GATEWAY_ID=your-gateway-id
CF_AIG_TOKEN=your-gateway-token
wrangler.toml
[ai]
binding = "AI"

For full details, see the Cloudflare AI guide.

Amazon Bedrock credentials and region are configured automatically by the AWS adapter. Optionally override the region:

AWS_BEDROCK_REGION=us-east-1

GCP credentials are picked up automatically via Application Default Credentials. No additional environment variables are required when deploying to Cloud Run or Cloud Functions.

Azure, DigitalOcean, Docker (OpenAI-compatible)

Section titled “Azure, DigitalOcean, Docker (OpenAI-compatible)”

Set an API key and, optionally, a custom base URL:

OPENAI_API_KEY=your-api-key
OPENAI_BASE_URL=https://api.openai.com/v1 # default

For local inference with Ollama, point the base URL to your Ollama instance:

OPENAI_API_KEY=ollama
OPENAI_BASE_URL=http://localhost:11434/v1
import { AIService } from '@cruzjs/core';
// In a service
@Injectable()
export class MyService {
constructor(@Inject(AIService) private ai: AIService) {}
async process(text: string) {
const summary = await this.ai.chat({
prompt: text,
system: 'Summarize the following text concisely.',
size: 'medium',
});
return summary; // string | null
}
}
const response = await this.ai.chat({
prompt: 'Explain edge computing in one paragraph.',
system: 'You are a technical writer.',
size: 'small', // 'small' | 'medium' | 'large'
temperature: 0.7,
maxTokens: 200,
});
SizeModelBest For
smallGemini 2.5 Flash LiteSimple tasks, fast
mediumGemini 2.5 FlashGeneral purpose (default)
largeGemini 2.5 ProComplex reasoning

Extract typed data from unstructured text with automatic Zod validation and retry:

const schema = z.object({
title: z.string(),
topics: z.array(z.string()),
sentiment: z.enum(['positive', 'negative', 'neutral']),
});
const result = await this.ai.extractStructured({
prompt: articleText,
system: 'Extract metadata from this article.',
schema,
schemaName: 'ArticleMetadata',
size: 'medium',
maxRetries: 3,
});
if (result) {
console.log(result.title); // Fully typed
}

Generate vector embeddings for search and similarity:

const vectors = await this.ai.embed(['search query', 'document text']);
// vectors: number[][] | null

Model options: 'small' (384d), 'base' (768d, default), 'large' (1024d).

const description = await this.ai.describeImage(imageBuffer);
// Or with a custom prompt:
const analysis = await this.ai.describeImage(imageBuffer, 'What objects are in this image?');
const sentiment = await this.ai.analyzeSentiment('This product is amazing!');
// { label: 'POSITIVE', score: 0.95 } | null
export const aiRouter = router({
summarize: protectedProcedure
.input(z.object({ text: z.string().max(10000) }))
.mutation(async ({ input }) => {
const container = await getAppContainer();
const ai = container.resolve(AIService);
return ai.chat({
prompt: input.text,
system: 'Summarize concisely.',
});
}),
});

All AIService methods return null on failure:

  • AI not configured (missing env vars)
  • Workers AI binding not available (local dev)
  • Network errors or model failures
  • Schema validation failures (after retries)
const result = await this.ai.chat({ prompt: 'Hello' });
if (!result) {
// Handle gracefully — provide fallback or skip AI features
}
  • AI Gateway methods (chat, extractStructured) work locally if env vars are set
  • Workers AI methods (embed, describeImage, analyzeSentiment) require wrangler dev for the AI binding
  • All methods return null gracefully when unavailable
  1. Check isConfigured() before features that require AI
  2. Use the smallest model size that works for your task
  3. Cache responses with KVCacheService for repeated inputs
  4. Set reasonable maxTokens to control costs
  5. Use extractStructured() over chat() when you need typed output