Archives: March 2026

Found 19 posts

March 30, 2026 · 9 min read

Nemotron-3-Super-120B: Topology Benchmark on DGX Spark

The model where cluster never loses. Nemotron-3-Super benefits more from TP=2 than any other model tested -- and the SM12.1 CUTLASS patch doubles performance vs FlashInfer.

#DGX Spark #benchmarking #Nemotron #vLLM #CUTLASS #AI #Local AI

March 23, 2026 · 6 min read

App B The Napkin Math: Predicting Token Speed from Memory Bandwidth

One formula tells you whether your hardware is working correctly or misconfigured. The community used this to diagnose the 40 t/s problem.

#DGX Spark #benchmarking #memory bandwidth #AI #Local AI

March 22, 2026 · 3 min read

App A Why Blackwell's Native MX-Format Support Matters

Blackwell's native MX-format support eliminates the dequantization tax — the hardware and gpt-oss-120b were designed for each other.

#DGX Spark #Blackwell #quantization #AI #Local AI

March 22, 2026 · 7 min read

GPT-OSS-120B: Topology Benchmark on DGX Spark

Cluster TP=2 dominates decode up to c32, then simple-shuffle takes over. Least-busy routing collapses under load. Full topology comparison with four configurations.

#DGX Spark #benchmarking #GPT-OSS-120B #vLLM #LiteLLM #AI #Local AI

March 21, 2026 · 22 min read

9/9 Claude Code with Local Models: The Series Finale

Everything in the series built toward this: Claude Code running on locally served models. Here's what works, what's rough, and where it's heading.

#DGX Spark #Claude Code #LiteLLM #vLLM #AI #Local AI

March 19, 2026 · 12 min read

8/9 The Recipe System: One Command, Zero Flag Archaeology

The recipe system from spark-vllm-docker turns twenty minutes of flag archaeology into one command — and makes everything reproducible.

#DGX Spark #vLLM #recipes #DevOps #AI #Local AI

March 17, 2026 · 10 min read

7/9 Solo vs Cluster: Where Two Sparks Beat One (and Where They Don't)

Neither topology dominates — cluster wins decode at every concurrency level, but 2x Solo wins prefill and TTFT under load. The right choice depends on the workload.

#DGX Spark #cluster #benchmarking #vLLM #LiteLLM #AI #Local AI

March 15, 2026 · 7 min read

6/9 Two Sparks, One Cluster: Setting Up with Claude Code

The second DGX Spark arrived. Before writing a single line of config: check firmware. Then cables, SSH, Docker, vLLM, model cache — and Claude Code helping build the skills to manage it all.

#DGX Spark #cluster #Claude Code #vLLM #Ray #AI #Local AI

March 15, 2026 · 6 min read

5/5 From Skills to Agents: What Comes Next

Skills you build today become components for autonomous agents tomorrow. The progression: skill → command → plugin → autonomous agent. Here's where it's all heading.

#Claude Code #Agent SDK #skills #automation #AI #ai-augmented-workflow

March 13, 2026 · 6 min read

5/9 LiteLLM: The Translation Layer Between Claude Code and Local Models

Claude Code speaks Anthropic. gpt-oss-120b speaks OpenAI with Harmony-style tool calls. LiteLLM sits in the middle and translates — including a custom callback that patches the tool calls neither side gets right.

#DGX Spark #LiteLLM #Claude Code #vLLM #proxy #AI #Local AI

March 12, 2026 · 6 min read

4/5 Anatomy of a Plugin: Inside Marp Magic

10 agents. 3 commands. 3 skills. One plugin that turns a topic into a finished presentation. Here's how agents, commands, skills, and hooks work together inside Marp Magic.

#Claude Code #plugins #agents #skills #Marp #ai-augmented-workflow

March 11, 2026 · 8 min read

4/9 Benchmarking Reality: llama-benchy and the Spark Arena

Post 3 hit 4,158 t/s at c64. llama-benchy puts those numbers under the microscope. Same hardware, two tools, 25x difference in TTFT.

#DGX Spark #benchmarking #llama-benchy #vLLM #AI #Local AI

March 9, 2026 · 6 min read

3/9 Switching to vLLM: From 40 t/s to 3,975 t/s

Every reviewer tested single-user latency and called the DGX Spark slow. Nobody tested concurrency. The community found the real number: 3,975 tokens per second.

#DGX Spark #vLLM #AI #Local AI

March 9, 2026 · 5 min read

3/5 Skills That Solve Real Problems: Account Reconciliation

Manual invoice matching: 20% exception rates, hours of CFO time, error-prone. A skill reduces that to under 5% in minutes. Here's how — and it was built by a non-developer.

#Claude Code #skills #finance #reconciliation #AI #ai-augmented-workflow

March 6, 2026 · 8 min read

2/9 First Steps: Running Models on Ollama (20B → 120B)

The standard recipe works but wastes the hardware. Scaling from 20B to 120B on Ollama shows the potential — and the ceiling.

#DGX Spark #Ollama #Open WebUI #AI #Local AI

March 6, 2026 · 6 min read

2/5 Your First Skill in 10 Minutes: Meeting Summary

Write a skill file. Run it on messy meeting notes. Get a structured summary. Refine it. Run it again. The whole cycle in 10 minutes — no code, just clear instructions.

#Claude Code #skills #tutorial #AI #ai-augmented-workflow

March 4, 2026 · 4 min read

1/9 When nvidia-smi Goes Blind: Setting Up btop on DGX Spark

NVIDIA's own monitoring can't see their newest hardware. The community had a fix before NVIDIA did.

#DGX Spark #btop #monitoring #AI #Local AI

March 3, 2026 · 4 min read

1/5 Claude Code Isn't for Developers (It's for Everyone)

You don't need to write code to use Claude Code. Skills are instructions in plain Markdown — if you can write a recipe, you can teach AI to do your repetitive work.

#Claude Code #skills #AI #no-code #ai-augmented-workflow

March 2, 2026 · 6 min read

Intro Bring It Home First

Specs can lie in both directions. Snake oil oversells. Reviews undersell. The only truth is your own testing.

#DGX Spark #AI #Local AI

Posts