Posts tagged with "LiteLLM"

Found 4 posts

March 22, 2026 · 7 min read

GPT-OSS-120B: Topology Benchmark on DGX Spark

Cluster TP=2 dominates decode up to c32, then simple-shuffle takes over. Least-busy routing collapses under load. Full topology comparison with four configurations.

#DGX Spark #benchmarking #GPT-OSS-120B #vLLM #LiteLLM #AI #Local AI

March 21, 2026 · 22 min read

9/9 Claude Code with Local Models: The Series Finale

Everything in the series built toward this: Claude Code running on locally served models. Here's what works, what's rough, and where it's heading.

#DGX Spark #Claude Code #LiteLLM #vLLM #AI #Local AI

March 17, 2026 · 10 min read

7/9 Solo vs Cluster: Where Two Sparks Beat One (and Where They Don't)

Neither topology dominates — cluster wins decode at every concurrency level, but 2x Solo wins prefill and TTFT under load. The right choice depends on the workload.

#DGX Spark #cluster #benchmarking #vLLM #LiteLLM #AI #Local AI

March 13, 2026 · 6 min read

5/9 LiteLLM: The Translation Layer Between Claude Code and Local Models

Claude Code speaks Anthropic. gpt-oss-120b speaks OpenAI with Harmony-style tool calls. LiteLLM sits in the middle and translates — including a custom callback that patches the tool calls neither side gets right.

#DGX Spark #LiteLLM #Claude Code #vLLM #proxy #AI #Local AI

← Back to all posts