kubis.ai USE AI NOW, ASK ME HOW
Interface Theme
Typography Set
Layout Mode
Ouroboros Loop
  • Timeline
  • Plain MD

Posts

  • Blog (35)
    • AI Augmented Workflow 10
      • Claude Code Skills 5
      • Vscode Githubcopilot 5
    • Dgx_series 16
      • Benchmarks 4
    • Llms 5
      • Tokens In Logits Out 5
    • Markdown Et Al 3
  • Archive
    • 2026 21
      • April 2
      • March 19
    • 2025 14
      • December 8
      • August 5
      • June 1
  • Tags
    • AGENTS.md 1
    • AI 29
    • API 1
    • Agent SDK 1
    • BPE 1
    • Blackwell 1
    • CUTLASS 1
    • Claude Code 8
    • DGX Spark 16
    • DevOps 1
    • GPT-2 5
    • GPT-OSS-120B 2
    • Gemma4 2
    • Git 1
    • GitHub Copilot 4
    • Groq 1
    • Hugging Face 1
    • LLM 5
    • LiteLLM 4
    • Local AI 16
    • Marp 2
    • Mermaid 2
    • Nemotron 2
    • Ollama 1
    • Open WebUI 1
    • OpenAI 1
    • Ray 1
    • SentencePiece 1
    • VSCode 5
    • agents 1
    • ai-augmented-workflow 10
    • attention 1
    • automation 1
    • benchmarking 7
    • btop 1
    • cluster 2
    • diagrams 1
    • documentation 1
    • finance 1
    • introduction 1
    • llama-benchy 1
    • markdown 4
    • markdown-et-al 3
    • marp 1
    • memory bandwidth 1
    • mermaid 1
    • monitoring 1
    • no-code 1
    • plugins 1
    • presentations 2
    • proxy 1
    • quantization 1
    • recipes 1
    • reconciliation 1
    • sampling 1
    • skills 5
    • temperature 1
    • tokenization 1
    • top-k 1
    • top-p 1
    • transformer 2
    • tutorial 1
    • vLLM 11
    • visualization 1

Posts tagged with "LiteLLM"

Found 4 posts

March 22, 2026 · 7 min read

GPT-OSS-120B: Topology Benchmark on DGX Spark

Cluster TP=2 dominates decode up to c32, then simple-shuffle takes over. Least-busy routing collapses under load. Full topology comparison with four configurations.

#DGX Spark #benchmarking #GPT-OSS-120B #vLLM #LiteLLM #AI #Local AI
March 21, 2026 · 22 min read

9/9 Claude Code with Local Models: The Series Finale

Everything in the series built toward this: Claude Code running on locally served models. Here's what works, what's rough, and where it's heading.

#DGX Spark #Claude Code #LiteLLM #vLLM #AI #Local AI
March 17, 2026 · 10 min read

7/9 Solo vs Cluster: Where Two Sparks Beat One (and Where They Don't)

Neither topology dominates — cluster wins decode at every concurrency level, but 2x Solo wins prefill and TTFT under load. The right choice depends on the workload.

#DGX Spark #cluster #benchmarking #vLLM #LiteLLM #AI #Local AI
March 13, 2026 · 6 min read

5/9 LiteLLM: The Translation Layer Between Claude Code and Local Models

Claude Code speaks Anthropic. gpt-oss-120b speaks OpenAI with Harmony-style tool calls. LiteLLM sits in the middle and translates — including a custom callback that patches the tool calls neither side gets right.

#DGX Spark #LiteLLM #Claude Code #vLLM #proxy #AI #Local AI
← Back to all posts
Sidebar
Pin sidebar ›
Hub Timeline Archives
  • Timeline
  • Plain MD

Posts

  • Blog (35)
    • AI Augmented Workflow 10
      • Claude Code Skills 5
      • Vscode Githubcopilot 5
    • Dgx_series 16
      • Benchmarks 4
    • Llms 5
      • Tokens In Logits Out 5
    • Markdown Et Al 3
  • Archive
    • 2026 21
      • April 2
      • March 19
    • 2025 14
      • December 8
      • August 5
      • June 1
  • Tags
    • AGENTS.md 1
    • AI 29
    • API 1
    • Agent SDK 1
    • BPE 1
    • Blackwell 1
    • CUTLASS 1
    • Claude Code 8
    • DGX Spark 16
    • DevOps 1
    • GPT-2 5
    • GPT-OSS-120B 2
    • Gemma4 2
    • Git 1
    • GitHub Copilot 4
    • Groq 1
    • Hugging Face 1
    • LLM 5
    • LiteLLM 4
    • Local AI 16
    • Marp 2
    • Mermaid 2
    • Nemotron 2
    • Ollama 1
    • Open WebUI 1
    • OpenAI 1
    • Ray 1
    • SentencePiece 1
    • VSCode 5
    • agents 1
    • ai-augmented-workflow 10
    • attention 1
    • automation 1
    • benchmarking 7
    • btop 1
    • cluster 2
    • diagrams 1
    • documentation 1
    • finance 1
    • introduction 1
    • llama-benchy 1
    • markdown 4
    • markdown-et-al 3
    • marp 1
    • memory bandwidth 1
    • mermaid 1
    • monitoring 1
    • no-code 1
    • plugins 1
    • presentations 2
    • proxy 1
    • quantization 1
    • recipes 1
    • reconciliation 1
    • sampling 1
    • skills 5
    • temperature 1
    • tokenization 1
    • top-k 1
    • top-p 1
    • transformer 2
    • tutorial 1
    • vLLM 11
    • visualization 1

April 9, 2026 ·