llm

Large Language Models

Fast LLM Inference

Fast LLM Inference is a large language models capability available through Groq on Aweb. Ultra-low latency LLM inference optimized for speed via dedicated hardware. Access it through a single unified API with automatic failover and intelligent routing.

Try Fast LLM InferenceAPI docs

Best for

Highest quality

Groq

Premium tier

Most affordable

Groq

Economy tier

Contract

Max Latency500ms
Streaming RequiredYes

Providers (1)

ProviderScoreQualityPricing
GroqDEFAULT
99premiumeconomy

Public discovery and orchestration

Inspect the live capability descriptor directly, then route orchestration through a capability filter. Generic public execute examples are intentionally withheld until the canonical public execute contract is normalized.

cURL

curl "https://aweblabs.ai/api/v2/capabilities/llm.fast-inference"

TypeScript

import Aweb from '@aweb/sdk';

const client = new Aweb({
  baseUrl: 'https://aweblabs.ai/api/v2',
});

const capability = await client.capabilities.get('llm.fast-inference');

console.log(capability.data.runtime.providers);

Orchestration pipeline

import Aweb from '@aweb/sdk';

const aweb = new Aweb({ apiKey: process.env.AWEB_API_KEY });

const result = await aweb.orchestrate.run({
  query: 'Use Fast LLM Inference to help with a hello-world task and summarize the output',
  capabilities: ['llm.fast-inference'],
  policy: 'balanced',
});

console.log(result.data.status);

Related Large Language Models capabilities

Chat Completion

llm

Streaming Chat

llm

Vision Analysis

llm

Structured Output

llm

Code Completion

llm

Getting started →API reference →All providers →All capabilities →