GPT-OSS-120B: Topology Benchmark on DGX Spark
Cluster TP=2 dominates decode up to c32, then simple-shuffle takes over. Least-busy routing collapses under load. Full topology comparison with four configurations.
Found 4 posts
Cluster TP=2 dominates decode up to c32, then simple-shuffle takes over. Least-busy routing collapses under load. Full topology comparison with four configurations.
Everything in the series built toward this: Claude Code running on locally served models. Here's what works, what's rough, and where it's heading.
Neither topology dominates — cluster wins decode at every concurrency level, but 2x Solo wins prefill and TTFT under load. The right choice depends on the workload.
Claude Code speaks Anthropic. gpt-oss-120b speaks OpenAI with Harmony-style tool calls. LiteLLM sits in the middle and translates — including a custom callback that patches the tool calls neither side gets right.