Timeline

A chronological journey through all posts.

April 2026

Three Models, Two Sparks: Cross-Model Benchmark Comparison

Same hardware, three models, completely different performance profiles. GPT-OSS-120B is fastest despite 117B params. Gemma4 has the best TTFT. Nemotron never loses to shuffle. The right model depends on the workload.

Apr 6, 2026

Gemma4-26B: Topology Benchmark on DGX Spark

The smallest model benefits most from tensor parallelism: +50% at c1. Cluster TP=2 with --no-ray dominates decode through c32. Also: Ray vs PyTorch distributed -- the 2-8% you're leaving on the table.

Apr 6, 2026

March 2026

Nemotron-3-Super-120B: Topology Benchmark on DGX Spark

The model where cluster never loses. Nemotron-3-Super benefits more from TP=2 than any other model tested -- and the SM12.1 CUTLASS patch doubles performance vs FlashInfer.

Mar 30, 2026

App B The Napkin Math: Predicting Token Speed from Memory Bandwidth

One formula tells you whether your hardware is working correctly or misconfigured. The community used this to diagnose the 40 t/s problem.

Mar 23, 2026

App A Why Blackwell's Native MX-Format Support Matters

Blackwell's native MX-format support eliminates the dequantization tax — the hardware and gpt-oss-120b were designed for each other.

Mar 22, 2026

GPT-OSS-120B: Topology Benchmark on DGX Spark

Cluster TP=2 dominates decode up to c32, then simple-shuffle takes over. Least-busy routing collapses under load. Full topology comparison with four configurations.

Mar 22, 2026

9/9 Claude Code with Local Models: The Series Finale

Everything in the series built toward this: Claude Code running on locally served models. Here's what works, what's rough, and where it's heading.

Mar 21, 2026

8/9 The Recipe System: One Command, Zero Flag Archaeology

The recipe system from spark-vllm-docker turns twenty minutes of flag archaeology into one command — and makes everything reproducible.

Mar 19, 2026

7/9 Solo vs Cluster: Where Two Sparks Beat One (and Where They Don't)

Neither topology dominates — cluster wins decode at every concurrency level, but 2x Solo wins prefill and TTFT under load. The right choice depends on the workload.

Mar 17, 2026

6/9 Two Sparks, One Cluster: Setting Up with Claude Code

The second DGX Spark arrived. Before writing a single line of config: check firmware. Then cables, SSH, Docker, vLLM, model cache — and Claude Code helping build the skills to manage it all.

Mar 15, 2026

5/5 From Skills to Agents: What Comes Next

Skills you build today become components for autonomous agents tomorrow. The progression: skill → command → plugin → autonomous agent. Here's where it's all heading.

Mar 15, 2026

5/9 LiteLLM: The Translation Layer Between Claude Code and Local Models

Claude Code speaks Anthropic. gpt-oss-120b speaks OpenAI with Harmony-style tool calls. LiteLLM sits in the middle and translates — including a custom callback that patches the tool calls neither side gets right.

Mar 13, 2026

4/5 Anatomy of a Plugin: Inside Marp Magic

10 agents. 3 commands. 3 skills. One plugin that turns a topic into a finished presentation. Here's how agents, commands, skills, and hooks work together inside Marp Magic.

Mar 12, 2026

4/9 Benchmarking Reality: llama-benchy and the Spark Arena

Post 3 hit 4,158 t/s at c64. llama-benchy puts those numbers under the microscope. Same hardware, two tools, 25x difference in TTFT.

Mar 11, 2026

3/9 Switching to vLLM: From 40 t/s to 3,975 t/s

Every reviewer tested single-user latency and called the DGX Spark slow. Nobody tested concurrency. The community found the real number: 3,975 tokens per second.

Mar 9, 2026

3/5 Skills That Solve Real Problems: Account Reconciliation

Manual invoice matching: 20% exception rates, hours of CFO time, error-prone. A skill reduces that to under 5% in minutes. Here's how — and it was built by a non-developer.

Mar 9, 2026

2/9 First Steps: Running Models on Ollama (20B → 120B)

The standard recipe works but wastes the hardware. Scaling from 20B to 120B on Ollama shows the potential — and the ceiling.

Mar 6, 2026

2/5 Your First Skill in 10 Minutes: Meeting Summary

Write a skill file. Run it on messy meeting notes. Get a structured summary. Refine it. Run it again. The whole cycle in 10 minutes — no code, just clear instructions.

Mar 6, 2026

1/9 When nvidia-smi Goes Blind: Setting Up btop on DGX Spark

NVIDIA's own monitoring can't see their newest hardware. The community had a fix before NVIDIA did.

Mar 4, 2026

1/5 Claude Code Isn't for Developers (It's for Everyone)

You don't need to write code to use Claude Code. Skills are instructions in plain Markdown — if you can write a recipe, you can teach AI to do your repetitive work.

Mar 3, 2026

Intro Bring It Home First

Specs can lie in both directions. Snake oil oversells. Reviews undersell. The only truth is your own testing.

Mar 2, 2026

December 2025

5/5 From Story to Stage: Creating Presentations with Marp

The fairy tale is written. The diagrams are drawn. Now turn it all into a presentation -- without opening PowerPoint, without leaving VSCode, without losing your Git history.

Dec 27, 2025

4/5 AGENTS.md: Teaching AI How You Work

README is for humans. AGENTS.md is for AI. 40,000+ projects use it to give AI persistent instructions that survive across sessions. Here's how to write one.

Dec 24, 2025

3/5 GitHub Copilot Beyond Code: Your AI Writing Partner

20 million developers use Copilot for code. Almost nobody uses it for content. Here's how to turn a fairy tale outline into a full story with diagrams -- without writing a single line of code.

Dec 21, 2025